11. References and Records

Contents:
Introduction
Taking References to Arrays
Making Hashes of Arrays
Taking References to Hashes
Taking References to Functions
Taking References to Scalars
Creating Arrays of Scalar References
Using Closures Instead of Objects
Creating References to Methods
Constructing Records
Reading and Writing Hash Records to Text Files
Printing Data Structures
Copying Data Structures
Storing Data Structures to Disk
Transparently Persistent Data Structures
Program: Binary Trees

With as little a web as this will I ensnare as great a fly as Cassio.

- Shakespeare Othello, Act II, scene i

11.0. Introduction

Perl provides three fundamental data types: scalars, arrays, and hashes. It's certainly possible to write many programs without recourse to complex records, but most programs need something more complex than simple variables and lists.

Perl's three built-in types combine with references to produce arbitrarily complex and powerful data structures, the records that users of ancient versions of Perl desperately yearned for. Selecting the proper data structure and algorithm can make the difference between an elegant program that does its job quickly and an ungainly concoction that's glacially slow to execute and consumes system resources voraciously.

The first part of this chapter shows how to create and use plain references. The second part shows how to use references to create higher order data structures.

References

To grasp the concept of references, you must first understand how Perl stores values in variables. Each defined variable has a name and the address of a chunk of memory associated with it. This idea of storing addresses is fundamental to references because a reference is a value that holds the location of another value. The scalar value that contains the memory address is called a reference. Whatever value lives at that memory address is called a referent. (You may also call it a "thingie" if you prefer to live a whimsical existence.) See Figure 11.1.

The referent could be any of Perl's built-in types (scalar, array, hash, ref, code, or glob) or a user-defined type based on one of the built-in ones.

Figure 11.1: Reference and referent

Referents in Perl are typed. This means you can't treat a reference to an array as though it were a reference to a hash, for example. Attempting to do so produces a runtime exception. No mechanism for type casting exists in Perl. This is considered a feature.

So far, it may look as though a reference were little more than a raw address with strong typing. But it's far more than that. Perl takes care of automatic memory allocation and deallocation (garbage collection) for references, just as it does for everything else. Every chunk of memory in Perl has a reference count associated with it, representing how many places know about that referent. The memory used by a referent is not returned to the process's free pool until its reference count reaches zero. This ensures that you never have a reference that isn't valid - no more core dumps and general protection faults from mismanaged pointers as in C.

Freed memory is returned to Perl for later use, but few operating systems reclaim it and decrease the process's memory footprint. This is because most memory allocators use a stack, and if you free up memory in the middle of the stack, the operating system can't take it back without moving the rest of the allocated memory around. That would destroy the integrity of your pointers and blow XS code out of the water.

To follow a reference to its referent, preface the reference with the appropriate type symbol for the data you're accessing. For instance, if $sref is a reference to a scalar, you can say:

print $$sref;    # prints the scalar value that the reference $sref refers to
$$sref = 3;      # assigns to $sref's referent

To access one element of an array or hash whose reference you have, use the infix pointer-arrow notation, as in $rv->[37] or $rv->{"wilma"}. Besides dereferencing array references and hash references, the arrow is also used to call an indirect function through its reference, as in $code_ref->("arg1", "arg2"); this is discussed Recipe 11.4. If you're using an object, use an arrow to call a method, $object->methodname("arg1", "arg2"), as shown in Chapter 13, Classes, Objects, and Ties.

Perl's syntax rules make dereferencing complex expressions tricky - it falls into the category of "hard things that should be possible." Mixing right associative and left associative operators doesn't work out well. For example, $$x[4] is the same as $x->[4]; that is, it treats $x as a reference to an array and then extracts element number four from that. This could also have been written ${$x}[4]. If you really meant "take the fifth element of @x and dereference it as a scalar reference," then you need to use ${$x[4]}. You should avoid putting two type signs ($@%&) side-by-side, unless it's simple and unambiguous like %hash = %$hashref.

In the simple cases using $$sref above, you could have written:

print ${$sref};             # prints the scalar $sref refers to
${$sref} = 3;               # assigns to $sref's referent

For safety, some programmers use this notation exclusively.

When passed a reference, the ref function returns a string describing its referent. (It returns false if passed a non-reference.) This string is usually one of SCALAR, ARRAY, HASH, or CODE, although the other built-in types of GLOB, REF, IO, Regexp, and LVALUE also occasionally appear. If you call ref on a non-reference, it returns an empty string. If you call ref on an object (a reference whose referent has been blessed), it returns the class the object was blessed into: CGI, IO::Socket, or even ACME::Widget.

You can create references in Perl by taking references to things that are already there or by using the [ ], { }, and sub { } composers. The backslash operator is simple to use: put it before the thing you want a reference to. For instance, if you want a reference to the contents of @array, just say:

$aref = \@array;

You can even create references to constant values; future attempts to change the value of the referent will cause a runtime error:

$pi = \3.14159;
$$pi = 4;           # runtime error

Anonymous Data

Taking references to existing data is helpful when you're using pass-by-reference in a function call, but for dynamic programming, it becomes cumbersome. You need to be able to grow data structures at will, to allocate new arrays and hashes (or scalars or functions) on demand. You don't want to be bogged down with having to give them names each time.

Perl can explicitly create anonymous arrays and hashes, which allocate a new array or hash and return a reference to that memory:

$aref = [ 3, 4, 5 ];                                # new anonymous array
$href = { "How" => "Now", "Brown" => "Cow" };       # new anonymous hash

Perl can also create a reference implicitly by autovivification. This is what happens when you try to assign through an undefined references and Perl automatically creates the reference you're trying to use.

undef $aref;
@$aref = (1, 2, 3);
print $aref;
ARRAY(0x80c04f0)

Notice how we went from an undefined variable to one with an array reference in it without actually assigning anything? Perl filled in the undefined reference for you. This is the property that permits something like this to work as the first statement in your program:

$a[4][23][53][21] = "fred";
print $a[4][23][53][21];
fred
print $a[4][23][53];
ARRAY(0x81e2494)
print $a[4][23];
ARRAY(0x81e0748)
print $a[4];
ARRAY(0x822cd40)

The following table shows mechanisms for producing references to both named and anonymous scalars, arrays, hashes, and functions. (Anonymous typeglobs are too scary to show - and virtually never used. It's best to use Symbol::gensym() or IO::Handle->new() for them.)

Reference to	Named	Anonymous
Scalar	`\$scalar`	`\do{my $anon}`
Array	`\@array`	`[ LIST ]`
Hash	`\%hash`	`{ LIST }`
Code	`\&function`	`sub { CODE }`

These diagrams illustrate the differences between named and anonymous values. Figure 11.2 shows named values.

Figure 11.2: Named values

In other words, saying $a = \$b makes $$a and $b the same piece of memory. If you say $$a = 3, then the value of $b is set to 3.

Figure 11.3 shows anonymous values.

Figure 11.3: Anonymous values

Every reference evaluates as true, by definition, so if you write a subroutine that returns a reference, you can return undef on error and check for it with:

$op_cit = cite($ibid)       or die "couldn't make a reference";

The undef operator can be used on any variable or function in Perl to free its memory. This does not necessarily free memory, call object destructors, etc. It just decrements the reference count by one. Without an argument, undef produces an undefined value.

Records

The classic use of references in Perl is to circumvent the restriction that arrays and hashes may hold scalars only. References are scalars, so to make an array of arrays, make an array of array references. Similarly, hashes of hashes are implemented as hashes of hash references, arrays of hashes as arrays of hash references, hashes of arrays as hashes of array references, and so on.

Once you have these complex structures, you can use them to implement records. A record is a single logical unit composed of different attributes. For instance, a name, an address, and a birthday might comprise a record representing a person. C calls such things structs, and Pascal calls them RECORDs. Perl doesn't have a particular name for these because you can implement this notion in different ways.

The most common technique in Perl is to treat a hash as a record, where the keys of the hash are the record's field names and the values of the hash are those fields' values.

For instance, we might create a "person" record like this:

$Nat = { "Name"     => "Leonhard Euler",
         "Address"  => "1729 Ramanujan Lane\nMathworld, PI 31416",
         "Birthday" => 0x5bb5580,
       };

Because $Nat is a scalar, it can be stored in an array or hash element, thus creating create groups of people. Now apply the array and hash techniques from Chapters 4 and 5 to sort the sets, merge hashes, pick a random record, and so on.

The attributes of a record, including the "person" record, are always scalars. You can certainly use numbers as readily as strings there, but that's no great trick. The real power play happens when you use even more references for values in the record. "Birthday", for instance, might be stored as an anonymous array with three elements: day, month, and year. You could then say $person->{"Birthday"}->[0] to access just the day field. Or a date might be represented as a hash record, which would then lend itself to access such as $person->{"Birthday"}->{"day"}. Adding references to your collection of skills makes possible many more complex and useful programming strategies.

At this point, we've conceptually moved beyond simple records. We're now creating elaborate data structures that represent complicated relationships between the data they hold. Although we can use these to implement traditional data structures like linked lists, the recipes in the second half of this chapter don't deal specifically with any particular structure. Instead, they give generic techniques for loading, printing, copying, and saving generic data structures. The final program example demonstrates how to manipulate binary trees.