[Chapter 4] 4.6 A Brief Tutorial: Manipulating Lists of Lists

4.6 A Brief Tutorial: Manipulating Lists of Lists

There are many kinds of nested data structures. The simplest kind to build is a list of lists (also called an array of arrays, or a multi-dimensional array). It's reasonably easy to understand, and almost everything that applies here will also be applicable to the fancier data structures.

4.6.1 Composition and Access

Here's how to put together a two-dimensional array value:

# assign to an array a list of list references
@LoL = ( 
       [ "fred", "barney" ],
       [ "george", "jane", "elroy" ],
       [ "homer", "marge", "bart" ],
);

print $LoL[2][2];   # prints "bart"

The overall list is enclosed by parentheses, not brackets. That's because you're assigning a list to an array. If you didn't want the result to be a list, but rather a reference to an array, then you would use brackets on the outside:

# assign to a scalar variable a reference to a list of list references
$ref_to_LoL = [
    [ "fred", "barney", "pebbles", "bambam", "dino", ],
    [ "homer", "bart", "marge", "maggie", ],
    [ "george", "jane", "elroy", "judy", ],
];

print $ref_to_LoL->[2][2];   # prints "elroy"

$ref_to_LoL is a reference to an array, whereas @LoL is an array proper. The parentheses (indicating a list) have changed to brackets (indicating the creation of a reference to an array). Unlike C, Perl doesn't allow you to freely interchange arrays with references to arrays. This is a feature.

Remember that there is an implied -> between every pair of adjacent braces or brackets. Therefore these two lines:

$LoL[2][2]
$ref_to_LoL->[2][2]

are equivalent to these two lines:

$LoL[2]->[2]
$ref_to_LoL->[2]->[2]

There is, however, no implied -> before the first pair of brackets, which is why the dereference of $ref_to_LoL requires the ->.

4.6.2 Growing Your Own

Now those big list assignments are well and good for creating a fixed data structure, but what if you want to calculate each element on the fly, or otherwise build the structure piecemeal?

First, let's look at reading a data structure in from a file. We'll assume that there's a flat file in which each line is a row of the structure, and each word an element. Here's how to proceed:

while (<>) {
    @tmp = split;
    push @LoL, [ @tmp ];
}

You can also load the array from a function:

for $i ( 1 .. 10 ) {
    @tmp = somefunc($i);
    $LoL[$i] = [ @tmp ];
}

Of course, you don't need to name the temporary array:

while (<>) {
    push @LoL, [ split ];
}

and:

for $i ( 1 .. 10 ) {
    $LoL[$i] = [ somefunc($i) ];
}

You also don't have to use push. You could keep track of where you are in the array, and assign each line of the file to the appropriate row of the array:

my (@LoL, $i, $line);
for $i ( 0 .. 10 ) { # just first 11 lines 
    $line = <>;
    $LoL[$i] = [ split ' ', $line ];
}

Simplifying, you can avoid the assignment of the line to a mediating variable:

my (@LoL, $i);
for $i ( 0 .. 10 ) { # just first 11 lines
    $LoL[$i] = [ split ' ', <> ];
}

In general, you should be leery of using potential list functions like <> in a scalar context without explicitly stating such. The following example would be clearer to the casual reader:

my (@LoL, $i);
for $i ( 0 .. 10 ) { # just first 11 lines
    $LoL[$i] = [ split ' ', scalar(<>) ];
}

If you want a $ref_to_LoL variable as a reference to an array, do something like:

my $ref_to_LoL;
while (<>) {
    push @$ref_to_LoL, [ split ];
}

So much for adding new rows to the list of lists. What about adding new columns? If you're just dealing with matrices, it's often easiest to use simple assignment:

for $x (1 .. 10) {
    for $y (1 .. 10) {
        $LoL[$x][$y] = func($x, $y);
    }
}

for $x ( 3, 7, 9 ) {
    $LoL[$x][20] += func2($x);
}

It doesn't matter whether the subscripted elements of @LoL are already there or not; Perl will gladly create them for you, setting intervening elements to the undefined value as need be. If you just want to append to a row, you have to do something a bit funnier looking:

# add new columns to an existing row
push @{ $LoL[0] }, "wilma", "betty";

Notice that this wouldn't work:

push $LoL[0], "wilma", "betty";  # WRONG!

In fact, that wouldn't even compile, because the argument to push must be a real array, not just a reference to an array. Therefore, the first argument absolutely must begin with an @ character. What comes after the @ is somewhat negotiable.

4.6.3 Access and Printing

Now it's time to print your data structure. If you only want one element, do this:

print $LoL[0][0];

If you want to print the whole thing, though, you can't just say:

print @LoL;         # WRONG

because you'll get references listed, and Perl will never automatically dereference thingies for you. Instead, you have to roll yourself a loop or two. The following code prints the whole structure, using the shell-style for construct to loop through the outer set of subscripts:

for $array_ref ( @LoL ) {
    print "\t [ @$array_ref ],\n";
}

Beware of the brackets. In this and the following example, the (non-subscripting) brackets do not indicate the creation of a reference. The brackets occur inside a quoted string, not in a place where a term is expected, and therefore lose their special meaning. They are just part of the string that print outputs.

If you want to keep track of subscripts, you might do this:

for $i ( 0 .. $#LoL ) {
    print "\t element $i is [ @{$LoL[$i]} ],\n";
}

or maybe even this (notice the inner loop):

for $i ( 0 .. $#LoL ) {
    for $j ( 0 .. $#{$LoL[$i]} ) {
        print "element $i $j is $LoL[$i][$j]\n";
    }
}

As you can see, things are getting a bit complicated. That's why sometimes it's easier to use a temporary variable on your way through:

for $i ( 0 .. $#LoL ) {
    $aref = $LoL[$i];
    for $j ( 0 .. $#{$aref} ) {
        print "element $i $j is $aref->[$j]\n";
    }
}

But that's still a bit ugly. How about this:

for $i ( 0 .. $#LoL ) {
    $aref = $LoL[$i];
    $n = @$aref - 1;
    for $j ( 0 .. $n ) {
        print "element $i $j is $aref->[$j]\n";
    }
}

4.6.4 Slices

If you want to get at a slice (part of a row) in a multi-dimensional array, you're going to have to do some fancy subscripting. That's because, while we have a nice synonym for a single element via the pointer arrow, no such convenience exists for slices. However, you can always write a loop to do a slice operation.

Here's how to create a one-dimensional slice of one subarray of a two-dimensional array, using a loop. We'll assume a list-of-lists variable (rather than a reference to a list of lists):

@part = ();
$x = 4;     
for ($y = 7; $y < 13; $y++) {
    push @part, $LoL[$x][$y];
}

That same loop could be replaced with a slice operation:

@part = @{ $LoL[4] } [ 7..12 ];

If you want a two-dimensional slice, say, with $x running from 4..8 and $y from 7..12, here's one way to do it:

@newLoL = ();
for ($startx = $x = 4; $x <= 8; $x++) {
    for ($starty = $y = 7; $y <= 12; $y++) {
        $newLoL[$x - $startx][$y - $starty] = $LoL[$x][$y];
    }
}

In this example, the individual values within each subarray of @newLoL are assigned one by one, taken from the appropriate locations in @LoL. An alternative is to create anonymous arrays, each consisting of a desired slice of a subarray of @LoL, and then put references to these anonymous arrays into @newLoL. So we are writing references into @newLoL (subscripted once, so to speak) instead of subarray values into a twice-subscripted @newLol. This method eliminates the innermost loop:

for ($x = 4; $x <= 8; $x++) {
    push @newLoL, [ @{ $LoL[$x] } [ 7..12 ] ];
}

Of course, if you do this very often, you should probably write a subroutine called something like extract_rectangle().

4.6.5 Common Mistakes

As mentioned previously, every array or hash in Perl is implemented in one dimension. "Multi-dimensional" arrays, too, are one-dimensional, but the values in this one-dimensional array are references to other data structures. If you print these values out without dereferencing them, you will get the references rather than the data referenced. For example, these two lines:

@LoL = ( [2, 3], [4, 5, 7], [0] );
print "@LoL";

result in:

ARRAY(0x83c38) ARRAY(0x8b194) ARRAY(0x8b1d0)

On the other hand, this line:

print $LoL[1][2];

yields 7 as output.

Perl dereferences your variables only when you employ one of the dereferencing mechanisms. But remember that $LoL[1][2] is just a convenient way to write $LoL[1]->[2], which in turn is a convenient way to write ${$LoL[1]}[2]. Indeed, you could write all your dereferencing operations with braces, but that would be uglier than ugly. Use the syntactic sugar Perl provides to sweeten your program.

@LoL was defined as an array whose values happened to be references. Here's a similar-looking, but very different case:

my $listref = [
    [ "fred", "barney", "pebbles", "bambam", "dino", ],
    [ "homer", "bart", "marge", "maggie", ],
    [ "george", "jane", "elroy", "judy", ],
];

print $listref[2][2];   # WRONG!

Here, $listref is not an array, but a scalar variable referring to an array - in this case, referring to an anonymous, multi-dimensional array, the one created by the outer brackets. Therefore, to print elroy in this example, we should have said:

print $listref->[2][2];

By contrast, $listref[2] in the erroneous print statement is the second element in a not-yet-declared array. If you ask to

use strict 'vars'; # or just use strict

then the use of the undeclared array will be flagged as an error at compile time.

In constructing an array of arrays, remember to take a reference for the daughter arrays. Otherwise, you will just create an array containing the element counts of the daughter arrays, like this:

for $i (1..10) {
    @list = somefunc($i);
    $LoL[$i] = @list;       # WRONG!
}

Here @list is being accessed in a scalar context, and therefore yields a count of its elements, which is assigned to $LoL[$i]. The proper way to take the reference will be shown in a moment.

Another common error involves taking a reference to the same memory location over and over again:

for $i (1..10) {
    @list = somefunc($i);
    $LoL[$i] = \@list;      # WRONG!
}

Every reference generated by the second line of the for loop is the same, namely, a reference to the single array @list. Yes, this array is being given a different set of values on each pass through the loop, but when everything is said and done, $LoL contains a set of identical references to the same array, which now holds the last set of values that were assigned to it.

Here's a more successful approach:

for $i (1..10) {
    @list = somefunc($i);
    $LoL[$i] = [ @list ];
}

The brackets make a reference to a new array with a copy of what's in @list at the time of the assignment.

A similar result - though much more difficult to read - would be produced by:

for $i (1..10) {
    @list = somefunc($i);
    @{$LoL[$i]} = @list;
}

Since $LoL[$i] needs to be a reference, the reference springs into existence. Then, the preceding @ dereferences this new reference, with the result that the values of @list are assigned (in list context) to the array referenced by $LoL[$i]. For clarity's sake, you might wish to avoid this construct.

But there is a situation in which you might use it. Suppose @LoL is already an array of references to arrays. That is, suppose you had made assignments like:

$LoL[3] = \@original_list;

And now suppose that you want to change @original_list (that is, you want to change the fourth row of $LoL) so that it refers to the elements of @list. This code will work:

@{$LoL[3]} = @list;

In this case, the reference itself does not change, but the elements of the array being referred to do. You need to be aware, however, that this approach overwrites the values of @original_list.

Finally, the following dangerous-looking code actually works fine:

for $i (1..10) {
    my @list = somefunc($i);
    $LoL[$i] = \@list;
}

That's because the my variable is created afresh each time through the loop. So even though it looks as though you stored the same variable reference each time, you actually did not. This is a subtle distinction, but the technique can produce more efficient code, at the risk of misleading less enlightened programmers. It's more efficient because there's no copy in the final assignment. On the other hand, if you have to copy the values anyway (which the first assignment above is doing), then you might as well use the copy implied by the brackets and avoid the temporary variable:

for $i (1..10) {
    $LoL[$i] = [ somefunc($i) ];
}

In summary:

$LoL[$i] = [ @list ];   # safest, sometimes fastest
$LoL[$i] = \@list;      # fast but risky, depends on my-ness of list
@{ $LoL[$i] } = @list;  # too tricky for most uses