[Chapter 5] 5.3 Objects

5.3 Objects

First of all, you need to understand packages and modules as previously described in this chapter. You also need to know what references and referenced thingies are in Perl; see Chapter 4, References and Nested Data Structures, for that.

It's also helpful to understand a little about object-oriented programming (OOP), so in the next section we'll give you a little course on OOL (object-oriented lingo).

5.3.1 Brief Refresher on Object-Oriented Programming

An object is a data structure with a collection of behaviors. We generally speak of behaviors as being performed by the object directly, sometimes to the point of anthropomorphizing the object. For example, we might say that a rectangle "knows" how to display itself on the screen, or "knows" how to compute its own area.

An object gets its behaviors by being an instance of a class. The class defines methods that apply to all objects belonging to that class, called instance methods.

The class will also likely include instance-independent methods, called class methods.[9] Some class methods create new objects of the classes, and are called constructor methods (such as "create a new rectangle with width 10 and height 5"). Other class methods might perform operations on many objects collectively ("display all rectangles"), or provide other necessary operations ("read a rectangle from this file").

[9] Or sometimes static methods.

A class may be defined so as to inherit both class and instance methods from parent classes, also known as base classes. This allows a new class to be created that is similar to an existing class, but with added behaviors. Any method invocation that is not found in a particular class will be searched for in the parent classes automatically. For example, a rectangle class might inherit some common behaviors from a generic polygon class.

While you might know the particular implementation of an object, generally you should treat the object as a black box. All access to the object should be obtained through the published interface via the provided methods. This allows the implementation to be revised, as long as the interface remains frozen (or at least, upward compatible). By published interface, we mean the written documentation describing how to use a particular class. (Perl does not have an explicit interface facility apart from this. You are expected to exercise common sense and common decency.)

Objects of different classes may be held in the same variable at different times. When a method is invoked on the contents of the variable, the proper method for the object's class gets selected automatically. If, for example, the draw() method is invoked on a variable that holds either a rectangle or a circle, the method actually used depends on the current nature of the object to which the variable refers. For this to work, however, the methods for drawing circles and rectangles must both be called draw().

Admittedly, there's a lot more to objects than this, and a lot of ways to find out more. But that's not our purpose here. So, on we go.

5.3.2 Perl's Objects

Here are three simple definitions that you may find reassuring:

An object is simply a referenced thingy that happens to know which class it belongs to.
A class is simply a package that happens to provide methods to deal with objects.
A method is simply a subroutine that expects an object reference (or a package name, for class methods) as its first argument.

We'll cover these points in more depth now.

5.3.3 An Object Is Simply a Referenced Thingy

Perl doesn't provide any special syntax for constructors. A constructor is merely a subroutine that returns a reference to a thingy that it has blessed into a class, generally the class in which the subroutine is defined. The constructor does this using the built-in bless function, which marks a thingy as belonging to a particular class. It takes either one or two arguments: the first argument is a regular hard reference to any kind of thingy, and the second argument (if present) is the package that will own the thingy. If no second argument is supplied, the current package is assumed. Here is a typical constructor:

package Critter;
sub new { return bless {}; }

The {} composes a reference to an empty anonymous hash. The bless function takes that hash reference and tells the thingy it references that it's now a member of the class Critter, and returns the reference. The same thing can be accomplished more explicitly this way:

sub new {
    my     $obref = {};         # ref to empty hash
    bless  $obref;              # make it an object in this class
    return $obref;              # return it
}

Once a reference has been blessed into a class, you can invoke the class's instance methods upon it. For example:

$circle->draw();

We'll discuss method invocation in more detail below.

Sometimes constructors call other methods in the class as part of the construction. Here we'll call an _initialize() method, which may be in the current package or in one of the classes (packages) that this class inherits from. The leading underscore is an oft-used convention indicating that the function is private, that is, to be used only by the class itself. This result can also be achieved by omitting the function from the published documentation for that class.

sub new {
    my $self = {}
    bless $self;
    $self->_initialize();
    return $self;
}

If you want your constructor method to be (usefully) inheritable, then you must use the two-argument form of bless. That's because, in Perl, methods execute in the context of the original base class rather than in the context of the derived class. For example, suppose you have a Polygon class that had a new() method as a constructor. This would work fine when called as Polygon->new(). But then you decide to also have a Square class, which inherits methods from the Polygon class. The only way for that constructor to build an object of the proper class when it is called as Square->new() is by using the two-argument form of bless, as in the following example:

sub new {
    my $class = shift;
    my $self = {};
    bless $self, $class;        # bless $self into the designated class
    $self->_initialize();       # in case there's more work to do
    return $self;
}

Within the class package, methods will typically deal with the reference as an ordinary (unblessed) reference to a thingy. Outside the class package, the reference should generally be treated as an opaque value that may only be accessed through the class's methods. (Mutually consenting classes may of course do whatever they like with each other, but even that doesn't necessarily make it right.)

A constructor may re-bless a referenced object currently belonging to another class, but then the new class is responsible for all cleanup later. The previous blessing is forgotten, as an object may only belong to one class at a time. (Although of course it's free to inherit methods from many classes.)

A clarification: Perl objects are blessed. References are not. Thingies know which package they belong to. References do not. The bless operator simply uses the reference in order to find the thingy. Consider the following example:

$a = {};            # generate reference to hash
$b = $a;            # reference assignment (shallow)
bless $b, Mountain;
bless $a, Fourteener;
print "\$b is a ", ref($b), "\n";

This reports $b as being a member of class Fourteener, not a member of class Mountain, because the second blessing operates on the underlying thingy that $a refers to, not on the reference itself. Thus is the first blessing forgotten.

5.3.4 A Class Is Simply a Package

Perl doesn't provide any special syntax for class definitions. You just use a package as a class by putting method definitions into the class.

Within each package a special array called @ISA tells Perl where else to look for a method if it can't find the method in that package. This is how Perl implements inheritance. Each element of the @ISA array is just the name of another package that happens to be used as a class. The packages are recursively searched (depth first) for missing methods, in the order that packages are mentioned in @ISA. This means that if you have two different packages (say, Mom and Dad) in a class's @ISA, Perl would first look for missing methods in Mom and all of her ancestor classes before going on to search through Dad and his ancestors. Classes accessible through @ISA are known as base classes of the current class, which is itself called the derived class.[10]

[10] Instead of "base class" and "derived class", some OOP literature uses superclass for the more generic classes and subclass for the more specific ones. Confusing the issue further, some literature uses "base class" to mean a "most super" superclass. That's not what we mean by it.

If a missing method is found in one of the base classes, Perl internally caches that location in the current class for efficiency, so the next time it has to find the method, it doesn't have to look so far. Changing @ISA or defining new subroutines invalidates this cache and causes Perl to do the lookup again.

If a method isn't found but an AUTOLOAD routine is found, then that routine is called on behalf of the missing method, with that package's $AUTOLOAD variable set to the fully qualified method name.

If neither a method nor an AUTOLOAD routine is found in @ISA, then one last, desperate try is made for the method (or an AUTOLOAD routine) in the special pre-defined class called UNIVERSAL. This package does not initially contain any definitions (although see CPAN for some), but you may place your "last-ditch" methods there. Think of it as a global base class from which all other classes implicitly derive.

If that method still doesn't work, Perl finally gives up and complains by raising an exception.

Perl classes do only method inheritance. Data inheritance is left up to the class itself. By and large, this is not a problem in Perl, because most classes model the attributes of their object using an anonymous hash. All the object's data fields (termed "instance variables" in some languages) are contained within this anonymous hash instead of being part of the language itself. This hash serves as its own little namespace to be carved up by the various classes that might want to do something with the object. For example, if you want an object called $user_info to have a data field named age, you can simply access $user_info->{age}. No declarations are necessary. See the section on "Instance Variables" under "Some Hints About Object Design" later in this chapter.

5.3.5 A Method Is Simply a Subroutine

Perl doesn't provide any special syntax for method definition. (It does provide a little syntax for method invocation, though. More on that later.) A method expects its first argument to indicate the object or package it is being invoked on.

5.3.5.1 Class methods

A class method expects a class (package) name as its first argument. (The class name isn't blessed; it's just a string.) These methods provide functionality for the class as a whole, not for any individual object instance belonging to the class. Constructors are typically written as class methods. Many class methods simply ignore their first argument, since they already know what package they're in, and don't care what package they were invoked via. (These aren't necessarily the same, since class methods follow the inheritance tree just like ordinary instance methods.)

Another typical use for class methods might be to look up an object by some nickname in a global registry:

sub find {
    my ($class, $nickname) = @_;
    return $objtable{$nickname};
}

5.3.5.2 Instance methods

An instance method expects an object reference[11] as its first argument. Typically it shifts the first argument into a private variable (often called $self or $this depending on the cultural biases of the programmer), and then it uses the variable as an ordinary reference:

[11] By which we mean simply an ordinary hard reference that happens to point to an object thingy. Remember that the reference itself doesn't know or care whether its thingy is blessed.

sub display {
    my $self = shift;
    my @keys;
    if (@_ == 0) {                  # no further arguments
        @keys = sort keys(%$self);
    }  else {
        @keys = @_;                 # use the ones given
    }
    foreach $key (@keys) {
        print "\t$key => $self->{$key}\n";
    }
}

Despite being counterintuitive to object-oriented novices, it's a good idea not to check the type of object that caused the instance method to be invoked. If you do, it can get in the way of inheritance.

5.3.5.3 Dual-nature methods

Because there is no language-defined distinction between definitions of class methods and instance methods (nor arbitrary functions, for that matter), you could actually have the same method work for both purposes. It just has to check whether it was passed a reference or not. Suppose you want a constructor that can figure out its class from either a classname or an existing object. Here's an example of the two uses of such a method:

$ob1  = StarKnight->new();
$luke = $ob1->new();

Here's how such a method might be defined. We use the ref function to find out the type of the object the method was called on so our new object can be blessed into that class. If ref returns false, then our $self argument isn't an object, so it must be a class name.

package StarKnight;
sub new {
    my $self  = shift;
    my $type  = ref($self) || $self;
    return bless {}, $type;
}

5.3.6 Method Invocation

Perl supports two different syntactic forms for explicitly invoking class or instance methods.[12] Unlike normal function calls, method calls always receive, as their first parameter, the appropriate class name or object reference upon which they were invoked.

[12] Methods may also be called implicitly due to object destructors, tied variables, or operator overloading. Properly speaking, none of these is a function invocation. Rather, Perl uses the information presented via the syntax to determine which function to call. Operator overloading is implemented by the standard overload module as described separately in Chapter 7.

The first syntax form looks like this:

METHOD CLASS_OR_INSTANCE LIST

Since this is similar to using the filehandle specification with print or printf, and also similar to English sentences like "Give the dog the bone," we'll call it the indirect object form. To look up an object with the class method find, and to print out some of its attributes with the instance method display, you could say this:

$fred = find Critter "Fred";
display $fred 'Height', 'Weight';

The indirect object form allows a BLOCK returning an object (or class) in the indirect object slot, so you can combine these into one statement:

display { find Critter "Fred" } 'Height', 'Weight';

The second syntax form looks like this:

CLASS_OR_INSTANCE->METHOD(LIST)

This second syntax employs the -> notation. It is sometimes called the object-oriented syntax. The parentheses are required if there are any arguments, because this form can't be used as a list operator, although the first form can.

$fred = Critter->find("Fred");
$fred->display('Height', 'Weight');

Or, you can put the above in only one statement, like this:

Critter->find("Fred")->display('Height', 'Weight');

There are times when one syntax is more readable, and times when the other syntax is more readable. The indirect object syntax is less cluttered, but it has the same ambiguity as ordinary list operators. If there is an open parenthesis following the class or object, then the matching close parenthesis terminates the list of arguments. Thus, the parentheses of

new Critter ('Barney', 1.5, 70);

are assumed to surround all the arguments of the method call, regardless of what comes afterward. Therefore, saying

new Critter ('Bam' x 2), 1.4, 45;

would be equivalent to

Critter->new('Bam' x 2), 1.4, 45;

which is unlikely to do what you want since the 1.4 and 45 are not being passed to the new() routine.

There may be occasions when you need to specify which class's method to use. In that case, you could call your method as an ordinary subroutine call, being sure to pass the requisite first argument explicitly:

$fred = MyCritter::find("Critter", "Fred");
MyCritter::display($fred, 'Height', 'Weight');

However, this does not do any inheritance. If you merely want to specify that Perl should start looking for a method in a particular package, use an ordinary method call, but qualify the method name with the package like this:

$fred = Critter->MyCritter::find("Fred");
$fred->MyCritter::display('Height', 'Weight');

If you're trying to control where the method search begins and you're executing in the class package itself, then you may use the SUPER pseudoclass, which says to start looking in your base class's @ISA list without having to explicitly name it:

$self->SUPER::display('Height', 'Weight');

The SUPER construct is meaningful only when used inside the class methods; while writers of class modules can employ SUPER in their own code, people who merely use class objects cannot.

Sometimes you want to call a method when you don't know the method name ahead of time. You can use the arrow form, replacing the method name with a simple scalar variable (not an expression or indexed aggregate) containing the method name:

$method = $fast ? "findfirst" : "findbest";
$fred->$method(@args);

We mentioned that the object-oriented notation is less syntactically ambiguous than the indirect object notation, even though the latter is less cluttered. Here's why: An indirect object is limited to a name, a scalar variable, or a BLOCK.[13] (If you try to put anything more complicated in that slot, it will not be parsed as you expect.) The left side of -> is not so limited. This means that A and B below are equivalent to each other, and C and D are also equivalent, but A and B differ from C and D:

[13] Attentive readers will recall that this is precisely the same list of syntactic items that are allowed after a funny character to indicate a variable dereference - for example, @ary, @$aryref, or @{$aryref}.

A: method $obref->{fieldname}
B: (method $obref)->{fieldname}

C: $obref->{fieldname}->method()
D: method {$obref->{fieldname}}

In A and B, the method applies to $obref, which must yield a hash reference with "fieldname" as a key. In C and D the method applies to $obref->{fieldname}, which must evaluate to an object appropriate for the method.

5.3.7 Destructors

When the last reference to an object goes away, the object is automatically destroyed. (This may even be after you exit, if you've stored references in global variables.) If you want to capture control just before the object is freed, you may define a DESTROY method in your class. It will automatically be called at the appropriate moment, and you can do any extra cleanup you desire. (Perl does the memory management cleanup for you automatically.)

Perl does not do nested destruction for you. If your constructor re-blessed a reference from one of your base classes, your DESTROY method may need to call DESTROY for any base classes that need it. But this only applies to re-blessed objects; an object reference that is merely contained within the current object - as, for example, one value in a larger hash - will be freed and destroyed automatically. This is one of the reasons why containership via mere aggregation (sometimes called a "has-a" relationship) is often cleaner and clearer than inheritance (an "is-a" relationship). In other words, often you really only need to store one object inside another directly instead of employing inheritance, which can add unnecessary complexity.

5.3.8 Method Autoloading

After Perl has vainly looked through an object's class package and the packages of its base classes to find a method, it also checks for an AUTOLOAD routine in each package before concluding that the method can't be found. One could use this property to provide an interface to the object's data fields (instance variables) without writing a separate function for each. Consider the following code:

use Person;
$him = new Person;
$him->name("Jason");
$him->age(23);
$him->peers( ["Norbert", "Rhys", "Phineas"] );
printf "%s is %d years old.\n", $him->name, $him->age;
print "His peers are: ", join(", ", @{$him->peers}), ".\n";

The Person class implements a data structure containing three fields: name, age, and peers. Instead of accessing the objects' data fields directly, you use supplied methods to do so. To set one of these fields, call a method of that name with an argument of the value the field should be set to. To retrieve one of the fields without setting it, call the method without an argument. Here's the code that does that:

package Person;
use Carp;       # see Carp.pm in Chapter 7

my %fields = (
    name        => undef,
    age         => undef,
    peers       => undef,
);

sub new {
    my $that  = shift;
    my $class = ref($that) || $that;
    my $self  = {
        %fields,
    };
    bless $self, $class;
    return $self;
} 

sub AUTOLOAD {
    my $self = shift;
    my $type = ref($self) || croak "$self is not an object";
    my $name = $AUTOLOAD;
    $name =~ s/.*://;   # strip fully-qualified portion
    unless (exists $self->{$name} ) {
        croak "Can't access `$name' field in object of class $type";
    } 
    if (@_) {
        return $self->{$name} = shift;
    } else {
        return $self->{$name};
    } 
}

As you see, there isn't really a method named name(), age(), or peers() to be found anywhere. The AUTOLOAD routine takes care of all of these. This class is a fairly generic implementation of something analogous to a C structure. A more complete implementation of this notion can be found in the Class::Template module contained on CPAN. The Alias module found there may also prove useful for simplifying member access.[14]

[14] CPAN is the Comprehensive Perl Archive Network, as described in the Preface.

5.3.9 A Note on Garbage Collection

High-level languages typically allow the programmers to dispense with worrying about deallocating memory when they're done using it. This automatic reclamation process is known as garbage collection. For most purposes, Perl uses a fast and simple, reference-based garbage collection system. One serious concern is that unreachable memory with a non-zero reference count will normally not get freed. Therefore, saying this is a bad idea:

{               # make $a and $b point to each other
    my($a, $b);
    $a = \$b;
    $b = \$a;
}

or more simply:

{               # make $a point to itself
    my $a;
    $a = \$a;
}

When a block is exited, its my variables are normally freed up. But their internal reference counts can never go to zero, because the variables point at each other or themselves. This is circular reference. No one outside the block can reach them, which makes them useless. But even though they should go away, they can't. When building recursive data structures, you'll have to break the self-reference yourself explicitly if you don't care to cause a memory leak.

For example, here's a self-referential node such as one might use in a sophisticated tree structure:

sub new_node {
    my $self = shift;
    my $class = ref($self) || $self;
    my $node = {};
    $node->{LEFT} = $node->{RIGHT} = $node;
    $node->{DATA} = [ @_ ];
    return bless $node, $class;
}

If you create nodes like this, they (currently)[15] won't ever go away unless you break the circular references yourself.

[15] In other words, this behavior is not to be construed as a feature, and you shouldn't depend on it. Someday, Perl may have a full mark-and-sweep style garbage collection as in Lisp or Scheme. If that happens, it will properly clean up memory lost to unreachable circular data.

Well, almost never.

When an interpreter thread finally shuts down (usually when your program exits), then a complete pass of garbage collection is performed, and everything allocated by that thread gets destroyed. This is essential to support Perl as an embedded or a multithreadable language. When a thread shuts down, all its objects must be properly destructed, and all its memory has to be reclaimed. The following program demonstrates Perl's multi-phased garbage collection:

#!/usr/bin/perl
package Subtle;

sub new {
    my $test;
    $test = \$test;   # Create a self-reference.
    warn "CREATING " . \$test;
    return bless \$test;
}

sub DESTROY {
    my $self = shift;
    warn "DESTROYING $self";
}

package main;

warn "starting program";
{
    my $a = Subtle->new;
    my $b = Subtle->new;
    $$a = 0;           # Break this self-reference, but not the other.
    warn "leaving block";
}

warn "just exited block";
warn "time to die...";
exit;

When run as /tmp/try, the following output is produced:

starting program at /tmp/try line 18.
CREATING SCALAR(0x8e5b8) at /tmp/try line 7.
CREATING SCALAR(0x8e57c) at /tmp/try line 7.
leaving block at /tmp/try line 23.
DESTROYING Subtle=SCALAR(0x8e5b8) at /tmp/try line 13.
just exited block at /tmp/try line 26.
time to die... at /tmp/try line 27.
DESTROYING Subtle=SCALAR(0x8e57c) during global destruction.

Notice that "global destruction" in the last line? That's the thread garbage collector reaching the unreachable.

Objects are always destructed even when regular references aren't, and in fact are destructed in a separate pass before ordinary references. This is an attempt to prevent object destructors from using references that have themselves been destructed. Plain references are (currently) only garbage collected if the "destruct level" is greater than 0, which is usually only true when Perl is invoked as an embedded interpreter. You can test the higher levels of global destruction in the regular Perl executable by setting the PERL_DESTRUCT_LEVEL environment variable (presuming the -DDEBUGGING option was enabled at Perl build time).