use strict /Debugging release 0.1
Sunday, 8 October 1995
The single feature most sorely lacking in the Perl programming languageprior to its 5.0 release was complex data structures. Even without directlanguage support, some valiant programmers did manage to emulate them, butit was hard work and not for the faint of heart. You could occasionallyget away with the $m{$a,$b} notation borrowed fromawk in which the keys are actually more like a singleconcatenated string "$a$b", but traversal andsorting were difficult. More desperate programmers even hacked Perl'sinternal symbol table directly, a strategy that proved hard to develop andmaintain--to put it mildly.
The 5.0 release of Perl let us have complex data structures. Youmay now write something like this and all of a sudden, you'd have a arraywith three dimensions!
for $x (1 .. 10) { for $y (1 .. 10) { for $z (1 .. 10) { $LoL[$x][$y][$z] = $x ** $y + $z; } } }Alas, however simple this may appear, underneath it's a much moreelaborate construct than meets the eye!
How do you print it out? Why can't you just say print @LoL? How doyou sort it? How can you pass it to a function or get one of these backfrom a function? Is is an object? Can you save it to disk to readback later? How do you access whole rows or columns of that matrix? Doall the values have to be numeric?
As you see, it's quite easy to become confused. While some small portionof the blame for this can be attributed to the reference-basedimplementation, it's really more due to a lack of existing documentation withexamples designed for the beginner.
This document is meant to be a detailed but understandable treatment ofthe many different sorts of data structures you might want to develop. It shouldalso serve as a cookbook of examples. That way, when you need to create one of thesecomplex data structures, you can just pinch, pilfer, or purloina drop-in example from here.
Let's look at each of these possible constructs in detail. There are separatedocuments on
use strict /Debugging@ARRAYs and %HASHes are all internallyone-dimensional. They can only hold scalar values (meaning a string,number, or a reference). They cannot directly contain other arrays orhashes, but instead contain references to other arrays or hashes.You can't use a reference to a array or hash in quite the same way thatyou would a real array or hash. For C or C++ programmers unused to distinguishingbetween arrays and pointers to the same, this can be confusing. If so,just think of it as the difference between a structure and a pointer to astructure.
You can (and should) read more about references in the perlref(1)manpage. Briefly, references are rather like pointers that know what theypoint to. (Objects are also a kind of reference, but we won't be needingthem right away--if ever.) That means that when you have something thatlooks to you like an access to two-or-more-dimensional array and/or hash,that what's really going on is that in all these cases, the base type ismerely a one-dimensional entity that contains references to the nextlevel. It's just that you can use it as though it were atwo-dimensional one. This is actually the way almost all Cmultidimensional arrays work as well.
$list[7][12] # array of arrays $list[7]{string} # array of hashes $hash{string}[7] # hash of arrays $hash{string}{'another string'} # hash of hashesNow, because the top level only contains references, if you try to printout your array in with a simple print() function, you'll get somethingthat doesn't look very nice, like this:
@LoL = ( [2, 3], [4, 5, 7], [0] ); print $LoL[1][2]; 7 print @LoL; ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
That's because Perl doesn't (ever) implicitly dereference your variables.If you want to get at the thing a reference is referring to, then you haveto do this yourself using either prefix typing indicators, like${$blah}, @{$blah}, @{$blah[$i]}, or else postfix pointer arrows,like $a->[3], $h->{fred}, or even $ob->method()->[3].
use strict /Debugging for $i (1..10) { @list = somefunc($i); $LoL[$i] = @list; # WRONG! } That's just the simple case of assigning a list to a scalar and gettingits element count. If that's what you really and truly want, then youmight do well to consider being a tad more explicit about it, like this:
for $i (1..10) { @list = somefunc($i); $counts[$i] = scalar @list; } Here's the case of taking a reference to the same memory locationagain and again:
for $i (1..10) { @list = somefunc($i); $LoL[$i] = \@list; # WRONG! } So, just what's the big problem with that? It looks right, doesn't it?After all, I just told you that you need an array of references, so bygolly, you've made me one!
Unfortunately, while this is true, it's still broken. All the referencesin @LoL refer to the very same place, and they will therefore all holdwhatever was last in @list! It's similar to the problem demonstrated inthe following C program:
#include <pwd.h> main() { struct passwd *getpwnam(), *rp, *dp; rp = getpwnam("root"); dp = getpwnam("daemon"); printf("daemon name is %s\nroot name is %s\n", dp->pw_name, rp->pw_name); }Which will print
daemon name is daemon root name is daemon
The problem is that both rp and dp are pointers to the same locationin memory! In C, you'd have to remember to malloc() yourself some newmemory. In Perl, you'll want to use the array constructor [] or thehash constructor {} instead. Here's the right way to do the precedingbroken code fragments
for $i (1..10) { @list = somefunc($i); $LoL[$i] = [ @list ]; } The square brackets make a reference to a new array with a copyof what's in @list at the time of the assignment. This is whatyou want.
Note that this will produce something similar, but it'smuch harder to read:
for $i (1..10) { @list = 0 .. $i; @{ $LoL[$i] } = @list; } Is it the same? Well, maybe so--and maybe not. The subtle differenceis that when you assign something in square brackets, you know for sureit's always a brand new reference with a new copy of the data.Something else could be going on in this new case with the @{$LoL[$i]}}dereference on the left-hand-side of the assignment. It all depends onwhether $LoL[$i] had been undefined to start with, or whether italready contained a reference. If you had already populated @LoL withreferences, as in
$LoL[3] = \@another_list;
Then the assignment with the indirection on the left-hand-side woulduse the existing reference that was already there:
@{ $LoL[3] } = @list;Of course, this would have the ``interesting'' effect of clobbering@another_list. (Have you ever noticed how when a programmer sayssomething is ``interesting'', that rather than meaning ``intriguing'',they're disturbingly more apt to mean that it's ``annoying'',``difficult'', or both? :-)
So just remember to always use the array or hash constructors with []or {}, and you'll be fine, although it's not always optimallyefficient.
Surprisingly, the following dangerous-looking construct willactually work out fine:
for $i (1..10) { my @list = somefunc($i); $LoL[$i] = \@list; } That's because my() is more of a run-time statement than it is acompile-time declaration per se. This means that the my() variable isremade afresh each time through the loop. So even though it looks asthough you stored the same variable reference each time, you actually didnot! This is a subtle distinction that can produce more efficient code atthe risk of misleading all but the most experienced of programmers. So Iusually advise against teaching it to beginners. In fact, except forpassing arguments to functions, I seldom like to see the gimme-a-referenceoperator (backslash) used much at all in code. Instead, I advisebeginners that they (and most of the rest of us) should try to use themuch more easily understood constructors [] and {} instead ofrelying upon lexical (or dynamic) scoping and hidden reference-counting todo the right thing behind the scenes.
In summary:
$LoL[$i] = [ @list ]; # usually best $LoL[$i] = \@list; # perilous; just how my() was that list? @{ $LoL[$i] } = @list; # way too tricky for most programmersuse strict /Debugging@{$LoL[$i]}, the following are actually thesame thing:$listref->[2][2] # clear $$listref[2][2] # confusing
That's because Perl's precedence rules on its five prefix dereferencers(which look like someone swearing: $ @ * % &) make them bind moretightly than the postfix subscripting brackets or braces! This will nodoubt come as a great shock to the C or C++ programmer, who is quiteaccustomed to using *a[i] to mean what's pointed to by the i'thelement of a. That is, they first take the subscript, and only thendereference the thing at that subscript. That's fine in C, but this isn't C.
The seemingly equivalent construct in Perl, $$listref[$i] first doesthe deref of $listref, making it take $listref as a reference to anarray, and then dereference that, and finally tell you the i'th valueof the array pointed to by $LoL. If you wanted the C notion, you'd have towrite ${$LoL[$i]} to force the $LoL[$i] to get evaluated firstbefore the leading $ dereferencer.
use strict /Debugginguse strict#!/usr/bin/perl -w use strict;
This way, you'll be forced to declare all your variables with my() andalso disallow accidental ``symbolic dereferencing''. Therefore if you'd donethis:
my $listref = [ [ "fred", "barney", "pebbles", "bambam", "dino", ], [ "homer", "bart", "marge", "maggie", ], [ "george", "jane", "alroy", "judy", ], ]; print $listref[2][2];
The compiler would immediately flag that as an error at compile time,because you were accidentally accessing @listref, an undeclaredvariable, and it would thereby remind you to instead write:
print $listref->[2][2]
use strict /DebuggingThe version for the debugger has several important new features, including command line editing as wellas the x and X commands to dump out complex data structures. For example, given the assignment to $LoL above, here's the debugger output:
DB<1> X $LoL $LoL = ARRAY(0x13b5a0) 0 ARRAY(0x1f0a24) 0 'fred' 1 'barney' 2 'pebbles' 3 'bambam' 4 'dino' 1 ARRAY(0x13b558) 0 'homer' 1 'bart' 2 'marge' 3 'maggie' 2 ARRAY(0x13b540) 0 'george' 1 'jane' 2 'alroy' 3 'judy'
There's also a lower-case x command which is nearly the same.
use strict /Debugging