[Chapter 7] 7.2 Objects in Perl

7.2 Objects in Perl

Let us define a few preliminary terms before we start implementing objects in Perl.

An object (also called an instance), like a given car, has the following:

Attributes or properties (color: red; seating capacity: 4; power: 180 HP)
Identity (my car is different from your car)
Behavior (it can be steered and moved forward and backward)

Objects of a certain type are said to belong to a class. My car and your car belong to the class called Car or, if you are not too worried about specific details, to a class called Vehicle. All objects of a class have the same functionality.

In this section, we study how to create objects and how to enrich basic designs using inheritance and polymorphism.

7.2.1 Attributes

An object is a collection of attributes. An array or a hash can be used to represents this set, as we discussed in Chapter 2, Implementing Complex Data Structures. For example, if you need to keep track of an employee's particulars, you might choose one of these approaches:

# Use a hash table to store Employee attributes
%employee = ("name"     => "John Doe",
             "age"      => 32,
             "position" => "Software Engineer");
print "Name: ", $employee{name};

# Or use an array
$name_field = 0; $age_field = 1; $position_field = 2;
@employee = ("John Doe", 32, "Software Engineer");
print "Name: ", $employee[$name_field];

The section "Efficient Attribute Storage" in Chapter 8, Object Orientation: The Next Few Steps describes a more efficient approach for storing attributes. Meanwhile, we will use a hash table for all our examples.

7.2.2 Unique Identity

Clearly, one %employee won't suffice. Each employee requires a unique identity and his or her own collection of attributes. You can either allocate this structure dynamically or return a reference to a local data structure, as shown below:

# Using an anonymous hash
sub new_employee {
    my ($name, $age, $starting_position) = @_;
    my $r_employee = {                  # Create a unique object 
        "name"     => $name,            # using an anonymous hash
        "age"      => $age,
        "position" => $starting_position
    };
    return $r_employee;                 # Return "object"
}

# OR, returning a reference to a local variable
sub new_employee {
    my ($name, $age, $starting_position) = @_;
    my %employee = (
        "name"     => $name,
        "age"      => $age,
        "position" => $starting_position
    );
    return \%employee;  # return a reference to a local object
}
# Use it to create two employees
$emp1 = new_employee("John Doe",   32, "Software Engineer");
$emp2 = new_employee("Norma Jean", 25, "Vice President");

new_employee() returns a reference to a unique data structure in both cases.

As a user of this subroutine, you are not expected to know whether this scalar contains a reference to a Perl data structure or whether it contains a string (for example, it could just contain a database primary key, while the rest of the details are in a corporate database). The employee details are hence well encapsulated. Not that encapsulation should not be confused with enforced privacy.

In the preceding example, the hash table is the object, and the reference to the hash table is termed the object reference. Keep in mind that we have not introduced any new syntax since the last chapter.

7.2.3 Behavior

All functions that access or update one or more attributes of the object constitute the behavior of the object.

Consider

sub promote_employee {
   my $r_employee = shift;
   $r_employee->{"position"} = 
       lookup_next_position($r_employee->{"position"});
}

# To use it
promote_employee($emp1);

Such functions are also called instance methods in OO circles because they require a specific instance of the an object; an employee, in this case.

To avoid having to suffix every method with the suffix "_employee," we put all these functions in a package of their own, called Employee:

package Employee;
sub new {   # No need for the suffix.
    ....
}
sub promote {
    ....
}

To use this module, you need to say:

$emp = Employee::new("John Doe", 32, "Software Engineer");
Employee::promote($emp);

As you can see, this code is beginning to encapsulate a class called Employee: the user of this code invokes only the interface functions new and promote and does not know or care about the type of data structure used to store employee details, or, as we mentioned earlier, whether a database is being used behind the scenes.

7.2.4 The Need for Polymorphism

What we have seen thus far is the kind of stuff that a C programmer would do, except that he or she would likely use a struct to keep track of the attributes. This is precisely the way the stdio library works, for example. fopen() is a constructor that returns a pointer to a unique FILE structure, allocated dynamically. The pointer (the object reference) is supplied to other methods like fgets() and fprintf().

Unfortunately, complications arise when the problem gets more involved. Let us say we have to keep information about hourly and regular employees. Hourly employees get paid by the hour and are eligible for overtime pay, while regular employees get a monthly salary. One way to approach it is to create a new function per type of employee:

package Employee;
# Creating Regular Employees
sub new_regular {
    my ($name, $age, $starting_position, $monthly_salary) = @_;
    my $employee = {
        "name"           => $name,
        "age"            => $age,
        "position"       => $starting_position,
        "monthly_salary" => $monthly_salary,
    };
    return $employee;  # return the object reference
}
# Hourly Employees
sub new_hourly {
    my ($name, $age, $starting_position, 
        $hourly_rate, $overtime_rate) = @_;
    my $employee = {
        "name"          => $name,
        "age"           => $age,
        "position"      => $starting_position,
        "hourly_rate"   => $hourly_rate,
        "overtime_rate" => $overtime_rate
    };
    return $employee;  # return the object reference
}

Now, if we want to get an employee's year-to-date salary, we have to make a distinction between the two types of employees. We could provide the two subroutines compute_hourly_ytd_income() and compute_regular_ytd_income(), but of course the story doesn't end there. Other differences between hourly and regular employees (such as allowed vacation, medical benefits, and so on) or the introduction of other types of employees (such as temporary employees) results in a combinatorial explosion of functions. Worse, the interface requires the user of this package to make a distinction between types of employees to be able to call the right function.

To get us out of this bind, we put different types of employees in different packages. Then we use the bless keyword to tag objects internally with a pointer to the packages they belong to. The boldface lines in the following example show the changes from the code presented above (explanations follow):

#-------------------------------------------------------------
package RegularEmployee;
sub new {
    my ($name, $age, $starting_position, $monthly_salary) = @_;
    my $r_employee = {                        
        "name"           => $name,            
        "age"            => $age,
        "position"       => $starting_position,
        "monthly_salary" => $monthly_salary,
        "months_worked"  => 0,
    };
    bless $r_employee, 'RegularEmployee';   # Tag object with pkg name
    return $r_employee;                     # Return object
}
sub promote {
   #...
}
sub compute_ytd_income{
   my $r_emp = shift;
   # Assume the months_worked attribute got modified at some point
   return $r_emp->{'monthly_salary'} * $r_emp->{'months_worked'};
}

#-------------------------------------------------------------
package HourlyEmployee;
sub new {
    my ($name, $age, $starting_position, 
        $hourly_rate, $overtime_rate) = @_;
    my $r_employee = {
        "name"          => $name,
        "age"           => $age,
        "position"      => $starting_position,
        "hourly_rate"   => $hourly_rate,
        "overtime_rate" => $overtime_rate
    };
    bless $r_employee, 'HourlyEmployee';
    return $r_employee;
}
sub promote {
   #...
}
sub compute_ytd_income {
   my ($r_emp) = $_[0];
   return $r_emp->{'hourly_rate'} * $r_emp->{'hours_worked'}
      + $r_emp->{'overtime_rate'} * $r_emp->{'overtime_hours_worked'};
}

bless is given an ordinary reference to a data structure. It tags that data structure (note: not the reference[1]) as belonging to a specific package and thus bestows on it some more powers, as we shall soon see. bless is to our hash table what baptism is to a child. It doesn't change the data structure in any way (which still remains a hash table), just as baptism doesn't really alter a person except to give them an additional identity.

[1] The reference is like a void * in C. The object is typed, not the C pointer or Perl reference.

The nice thing about bless is that it gives us a direct way of using this object. Here's how:

# First create two objects as before.
$emp1 = RegularEmployee::new('John Doe', 32,    # Polymorphism
                                'Software Engineer', 5000);
$emp2 = HourlyEmployee::new('Jane Smith', 35,   # Polymorphism
                               'Auditor', 65, 90);

Now use the arrow notation to directly invoke instance methods, or, as they say in OO-land, invoke methods on the object:

# Direct invocation
$emp1->promote(); 
$emp2->compute_ytd_income();

When Perl sees $emp1->promote(), it determines the class to which $emp1 belongs (the one under which it has been blessed). In this case, it is the Regular-Employee. Perl then calls this function as follows: RegularEmployee::promote($emp1). In other words, the object on the left side of the arrow is simply given as the first parameter of the appropriate subroutine.

Both the :: and -> notations are in fact permissible, unlike in C++. The first one is more flexible because Perl figures out the class at run time, while the latter is faster because the function to be called is known at compile time. There is nothing magical about an instance method in Perl. It is an ordinary subroutine whose first parameter simply happens to be an object reference. (You might have noticed that the promote method did not change from the previous section.)

So is this mere syntactic sugar? Finally, all we seem to have achieved is the ability to call an instance method of an object through an alternate notation.

No, we have gained an important advantage. The module user doesn't have to discriminate between types of objects using an if statement but instead lets Perl take care of routing a call to the appropriate function. That is, instead of saying something like

if (ref($emp) eq "HourlyEmployee") {
    $income = HourlyEmployee::compute_ytd_income($emp);
} else {
    $income = RegularEmployee::compute_ytd_income($emp);
}

we can simply say,

$income = $emp->compute_ytd_income();

This ability of Perl to call the appropriate module's function is called run-time binding. Incidentally, recall from Chapter 1, Data References and Anonymous Storage, that the ref function returns a string indicating the type of the entity pointed to by the reference; in the case of a blessed object reference, it returns the name of the corresponding class.

Note that while processing payroll records, $emp can be a regular employee in one iteration and an hourly employee in another. This feature is called polymorphism (poly + morph = the ability of an object to take on many forms).

Polymorphism and run-time binding are the chief contributions of object-oriented languages. They give a system an enormous amount of flexibility because you can now add a new type of employee (with the same interface as the other types) without having to change the payroll-processing code. This is possible because each object "knows" how to compute its own year-to-date income. It pays to remember this cardinal rule:

It is indicative of inflexible procedural design if you find yourself using conditional statements to distinguish between object types.

The design is flexible also because you can add new methods to any of the packages without hurting what is already present.

7.2.5 Class Methods and Attributes

Class attributes are properties that pertain to all instances of a class, but don't vary on a per-employee basis. For example, one insurance company might provide health coverage for all employees, so it doesn't make sense to store the name of this company in each and every employee.

Class methods (also known as static methods) are functions that are relevant to that class but don't need a specific object instance to work with. For example, a subroutine called get_employee_names() doesn't require an employee object as input to figure out what it has to do.

Perl has no specific syntax for class attributes and methods, unlike C++ or Java. Class attributes are simply package global variables, and class methods are ordinary subroutines that don't work on any specific instance. Perl supports polymorphism and run-time binding for these ordinary subroutines (not just instance methods), which can be leveraged to produce a truly flexible design. Consider

$record = <STDIN>; # Tab delimited record containing employee details
($type, $name, $age, $position) = split(/\t/, $details);

# Create an employee object of the appropriate class
$emp = $type->new($name, $age, $position);

# Now use the object as before
$emp->compute_ytd_income();

In this example, $type can contain either of these two strings: "HourlyEmployee" or "RegularEmployee." Note that this variable is not an object; it is simply the name of a class. This approach improves on the example in the previous section by avoiding having to hardcode the name of the package. Why is that an improvement? Well, if you didn't have this facility, you would have had to say something like this to create an appropriately typed object:

if ($type eq "HourlyEmployee") {
     $emp = HourlyEmployee->new(....);
} else {
     $emp = RegularEmployee->new(....);
}

Any piece of code that explicitly depends upon checking the class or type of an object requires too much maintenance. If you introduce a new type of employee tomorrow, you'll have to go back and add the new type to all such pieces of code.

Recall that in the case of an instance method, the object to the left of the arrow is passed as the first parameter to the subroutine. It is no different here. The procedure HourlyEmployee::new must be rewritten to expect this:

package HourlyEmployee;
sub new {
    my ($pkg, $name, $age, $starting_position, 
        $hourly_rate, $overtime_rate) = @_;

Given that both instance and class methods are ordinary subroutines, you can always write a subroutine that can function as either, by checking the type of the first parameter supplied to it. Consider the following constructor, which creates a new object or a clone of an existing one, depending on how it is invoked:

package Employee;
sub new {
    $arg = shift;
    if (ref($arg)) { 
        # Called as $emp->new(): Clone the Employee given to it
        #....
    } else {
        # Called as Employee->new():  Create a new employee
        #...
    }
}

You can now use this method as follows:

# Using new() as a class method
$emp1 = Employee->new("John Doe", 20, "Vice President");

# Using new() as an instance method to clone the employee details
$emp2 = $emp1->new();

I'll leave it up to you to answer why you might want to clone an employee!

What have we learned in this section? If we write all our class methods to expect the name of the module as the first parameter, we make it possible for the module's user to employ run-time binding and polymorphism. We will follow this practice from now on.

You might be curious why a class method needs to be supplied the name of its own module. We'll answer this shortly when we deal with inheritance.

7.2.5.1 Detour: The indirect notation

Perl wouldn't be Perl if there weren't a couple of alternatives to suit everyone's fancy. It supports an alternative to the arrow notation, called the indirect notation, in which the function name precedes the object or class name. An example should make this clear:

$emp = new Employee ("John Doe", 20, "Vice President");

C++ folks will identify with this notation. This approach can be used for objects too:

promote $emp "Chairman", 100000; # Give him a promotion and a raise

Notice that there is no comma between $emp and the first argument ("Chairman"). This is how Perl knows that you are calling a method using the indirect notation and not calling a subroutine in the current package. Perhaps you will identify more with the following example:

use FileHandle;
$fh = new FileHandle("> foo.txt");
print $fh "foo bar\n";

print is a method on the FileHandle module.

While the indirect notation has the same effect as the arrow notation, it cannot be used in a chain of calls. The following is possible only with the arrow notation:

use FileHandle;
$fh = FileHandle->new("> foo.txt")->autoflush(1); # Chain of calls

7.2.6 The Need for Inheritance

Perl allows a module to specify a list of other module names, in a special array called @ISA. When it does not find a particular class or instance method in a module, it looks to see if that module's @ISA has been initialized. If so, it checks to see if any of those modules support the missing function, picks the first one it can find, and passes control to it. This feature is called inheritance. Consider

package Man;
@ISA = qw(Mammal Social_Animal);

This allows us to specify that Man is-a Mammal and is-a Social_Animal. All traits (read: methods) common to mammals are supported in the Mammal class and don't have to be implemented in Man too. Let us look at a more practical example.

In our attempts to distinguish between hourly and regular employees, we have gone to the other extreme and made them completely independent. Clearly, there are a number of common attributes (name, age, and position) and behavior (promote, say) that they all share as employees. We can thus use inheritance to "pull out" the common aspects into a superclass (or base class) called Employee:

#---------------------------------------------------------
package Employee; #Base class
#---------------------------------------------------------
sub allocate{
    my ($pkg, $name, $age, $starting_position) = @_;
    my $r_employee = bless {
        "name"           => $name,            
        "age"            => $age,
        "position"       => $starting_position
    }, $pkg; 
    return $r_employee;
}
sub promote {
   my $r_employee            = shift;
   my $current_position      = $r_employee->{"position"};
   my $next_position         = lookup_next_position($current_position);
   $r_employee->{"position"} = $next_position;
}
#---------------------------------------------------------
package HourlyEmployee;
#---------------------------------------------------------
@ISA = ("Employee"); # Inherits from Employee
sub new {
    my ($pkg, $name, $age, $starting_position, 
        $hourly_rate, $overtime_rate) = @_;
    # Let the Employee package create and bless the object
    my $r_employee = $pkg->allocate($name, $age, 
                                    $starting_position);
    # Add HourlyEmployee-specific attributes and we are done.
    $r_employee->{"hourly_rate"}   = $hourly_rate;
    $r_employee->{"overtime_rate"} = $overtime_rate;
    return $r_employee; # return the object reference
}
sub compute_ytd_income { 
    .... 
}
# ... And similarly for package RegularEmployee

Whatever is common to all employees is implemented in the base class. Since both HourlyEmployee and RegularEmployee need a class method called new() to allocate a hash table, to bless it, and to insert common attributes into this table, we factor this functionality out into a inheritable subroutine called allocate in module Employee .

Notice how allocate avoids hardcoding the name of a class, thus ensuring maximum reusability. HourlyEmployee::new() calls $pkg->allocate, which means that the first parameter to allocate, $pkg, has the value HourlyEmployee. allocate uses this to bless the object directly into the inherited class. HourlyEmployee::new doesn't need to create the object anymore; it just has to insert its own specific attributes.

Nothing has changed from the user's point of view. You still say,

$emp = HourlyEmployee->new(....);

But we have now managed to eliminate redundant code in the modules and left them open for future enhancements.

7.2.6.1 Overriding base classes

Let us say we wanted to ensure that hourly employees should never rise above the level of a manager. The example shows how to override the base class's promote() method to do this check. Here's how:

package HourlyEmployee;
sub promote {
    my $obj = shift;
    die "Hourly Employees cannot be promoted beyond 'Manager'"
           if ($obj->{position} eq 'Manager');
    # call base class's promote
    $obj->Employee::promote(); #Specify the package explicitly
}

This syntax tells Perl to start the search for promote() in the @ISA hierarchy, starting from Employee. A small problem here is that by hardcoding the name of a class (Employee), we make it difficult for us to change our mind about the inheritance hierarchy. To avoid this, Perl provides a pseudoclass called SUPER, like Smalltalk, so that you can say,

    $obj->SUPER::promote();

This searches the @ISA hierarchy for the appropriate promote subroutine. Now, if we interpose another package between Employee and HourlyEmployee in the inheritance hierarchy, we just need to update HourlyEmployee's @ISA array.

NOTE: We have now gradually eliminated the need for the :: notation to call a module's subroutines. A subroutine either is imported directly into your namespace, in which case you don't need to fully qualify its name, or is invoked by using the -> notation. You still need to use "::" to access a foreign package's variables.

7.2.7 Object Destruction

Perl automatically garbage collects a data structure when its reference count drops to zero. If a data structure has been blessed into a module, Perl allows that module to perform some clean-up before it destroys the object, by calling a special procedure in that module called DESTROY and passing it the reference to the object to be destroyed:

package Employee;
sub DESTROY {
     my ($emp) = @_;
     print "Alas, ", $emp->{"name"}, " is now no longer with us \n";
}

This is similar to C++'s destructor or the finalize() method in Java in that Perl does the memory management automatically, but you get a chance to do something before the object is reclaimed. (Unlike Java's finalize, Perl's garbage collection is deterministic; DESTROY is called as soon as the object is not being referred to any more.)

Note that you are not compelled to declare this subroutine; you do so only if you have some clean-up work to be done. In a module such as Socket, you would close the corresponding connection, but in something like Employee, where no external system resources are being held up, you don't have to provide a DESTROY method. But recall that AUTOLOAD is called if a function is not found. In the case in which you supply AUTOLOAD but not the DESTROY method, you might want to ensure that AUTOLOAD checks for this possibility:

sub AUTOLOAD {
    my $obj = $_[0];
    # $AUTOLOAD contains the name of the missing method

    # Never propagate DESTROY methods
    return if $AUTOLOAD =~ /::DESTROY$/;
    # ....
}

7.2.8 Accessor Methods

According to Rumbaugh et al. [15]:

Encapsulation can be violated when code associated with one class directly accesses the attributes of another class. Direct access makes assumptions about storage format and location of the data. These details must be hidden within the class....The proper way to access an attribute of another object is to "ask for it" by invoking an operation on the object, rather than simply "taking it."

This is as true for classes related by inheritance as for unrelated classes.

To discourage direct access to an object's attributes, we provide "accessor methods." These two methods read and update the "position" attribute of an employee:

$pos = $emp->get_position();                 # read attribute
$emp->set_position("Software Engineer");     # write attribute

The more popular convention is to have one method to handle both read and write access:

$pos = $emp->position();                     # read attribute
$emp->position("Software Engineer");         # write attribute

This is how the module might implement it:

package Employee;
sub position {
    my $obj = shift;
    @_ ? $obj->{position} = shift            # modify attribute
       : $obj->{position};                   # retrieve attribute
}

Note that the method returns the latest value of the position attribute in both cases, because in both cases (get and set), the expression $obj->{position} is the last to be evaluated.

It might seem a complete waste of time to call a method every time you need to touch an attribute. But, as it happens, accessor methods are absolutely necessary in designing for change. Consider the following advantages:

Encapsulation

Accessor methods hide how object attributes are stored. If you change the way this layout is done, only these methods need to be modified; the rest of the code, including derived classes, remain untouched. In Perl, as in other OO scripting languages, in which reengineering may be necessary for performance or space efficiency, accessor methods are a good thing. Smalltalk, CORBA (Common Object Request Broker Architecture), and ActiveX are other well-known cases in which the only way to an attribute is through an accessor.

Side effects

Accessor methods are sometimes used for triggering actions in addition to retrieving or updating the attribute. GUI toolkits use this idiom routinely. For example:

     $button->foreground_color('yellow');

This not only changes the value of the foreground color attribute, but updates the screen too.

Access checking

Accessor methods can be made to disallow updates. For example, primary key attributes such as an employee's name should not be updatable once created; an accessor can easily enforce this.

Computed attributes

An employee's income can be seen as an attribute, though internally it needs to be computed. Instead of writing a method like compute_ytd_income(), you simply call it income(). This makes it look like an attribute accessor, and it can disallow updates to this attribute.

Moral of the story: Get in the habit of writing accessor methods. In the next chapter, we will study a module called ObjectTemplate, a standard library called Class::Template, and a module on CPAN called MethodMaker, all of which automatically create accessor methods for you, so there is really no reason not to use such methods.

Caveat: Even if your attributes are wrapped in accessor methods, you should be wary of unrelated classes using these methods. When reviewing a piece of code, always look for the real intention behind these accesses; sometimes it may be better to provide other methods that make this access unnecessary. For example, a user should always use $emp->promote() instead of directly updating the position attribute.