[Chapter 5] 5.4 Using Tied Variables

5.4 Using Tied Variables

In older versions of Perl, a user could call dbmopen to tie a hash to a UNIX DBM file. Whenever the hash was accessed, the database file on disk (really just a hash, not a full relational database) would be magically[16] read from or written to. In modern versions of Perl, you can bind any ordinary variable (scalar, array, or hash) to an implementation class by using tie. (The class may or may not implement a DBM file.) You can break this association with untie.

[16] In this case, magically means "transparently doing something very complicated". You know the old saying - any technology sufficiently advanced is indistinguishable from a Perl script.

The tie function creates the association by creating an object internally to represent the variable to the class. If you have a tied variable, but want to get at the underlying object, there are two ways to do it. First, the tie function returns a reference to the object. But if you didn't bother to store that object reference anywhere, you could still retrieve it using the tied function.

$object = tie VARIABLE, CLASSNAME, LIST
untie VARIABLE
$object = tied VARIABLE

The tie function binds the variable to the class package that provides the methods for that variable. Once this magic has been performed, accessing a tied variable automatically triggers method calls in the proper class. All the complexity of the class is hidden behind magic method calls. The method names are predetermined, since they're called implicitly from within the innards of Perl. These names are in ALL CAPS, which is a convention in Perl culture that indicates that the routines are called implicitly rather than explicitly - just like BEGIN, END, and DESTROY. And AUTOLOAD too, for that matter.

You can almost think of tie as a funny kind of bless, except that it blesses a bare variable instead of a thingy reference, and takes extra parameters, like a constructor. That's because it actually does call a constructor internally. (That's one of the magic methods we mentioned.) This constructor is passed the CLASSNAME you specified, as well as any additional arguments you supply in the LIST. It is not passed the VARIABLE, however. The only way the constructor can tell which kind of VARIABLE is being tied is by knowing its own method name. This is not the customary constructor name, new, but rather one of TIESCALAR, TIEARRAY, or TIEHASH. (You can likely figure out which name goes with which variable type.) The constructor just returns an object reference in the normal fashion, and doesn't worry about whether it was called from tie - which it may not have been, since you can call these methods directly if you like. (Indeed, if you've tied your variable to a class that provides other methods not accessible through the variable, you must call the other methods directly yourself, via the object reference. These extra methods might provide services like file locking or other forms of transaction protection.)

As in any constructor, these constructors must bless a reference to a thingy and return it as the implementation object. The thingy inside the implementation object doesn't have to be of the same type as the variable you're tying to. It does have to be a properly blessed object, though. See the example below on tied arrays, which uses a hash object to hold information about an array.

The tie function will not use or require a module for you - you must do that explicitly yourself. (On the other hand, the dbmopen emulator function will, for backward compatibility, attempt to use one or another DBM implementation. But you can preempt its selection with an explicit use, provided the module you use is one of the modules in dbmopen's list of modules to try. See the AnyDBM_File module in Chapter 7 for a fuller explanation.)

5.4.1 Tying Scalars

A class implementing a tied scalar must define the following methods: TIESCALAR, FETCH, STORE, and possibly DESTROY. These routines will be invoked implicitly when you tie a variable (TIESCALAR), read a tied variable (FETCH), or assign a value to a tied variable (STORE). The DESTROY method is called (as always) when the last reference to the object disappears. (This may or may not happen when you call untie, which destroys the reference used by the tie, but doesn't destroy any outstanding references you may have squirreled away elsewhere.) The FETCH and STORE methods are triggered when you access the variable that's been tied, not the object it's been tied to. If you have a handle on the object (either returned by the initial tie or retrieved later via tied), you can access the underlying object yourself without automatically triggering its FETCH or STORE methods.

Let's look at each of these methods in turn, using as our example an imaginary class called Nice.[17] Variables tied to this class are scalars containing process priorities, and each such variable is implicitly associated with an object that contains a particular process ID, such the ID of the currently running process or of the parent process. (Presumably you'd name your variables to remind you which process you're referring to.) Variables are tied to the class this way:

[17] UNIX priorities are associated with the word "nice" because they're inverted from what you'd expect. Higher priorities run slower, hence are "nicer" to other processes. A more portable module might prefer a less UNIX-centric name like Priority. But if we were writing this class for the Perl library, we'd probably call it Tie::Priority or some such, to fit the library's hierarchical naming scheme. Not everything can be a top-level class, or things will get rather confused. Not to mention people.

use Nice;       # load the Nice.pm module

tie $his_speed, 'Nice', getppid();
tie $my_speed,  'Nice', $$;

Once the variables have been tied, their previous contents are no longer accessible. The internally forged connection between the variable and the object takes precedence over ordinary variable semantics.

For example, let's say you copy a variable that's been tied:

$speed = $his_speed;

Instead of reading the value in the ordinary fashion from the $his_speed scalar variable, Perl implicitly calls the FETCH method on the associated underlying object. It's as though you'd written this:

$speed = (tied $his_speed)->FETCH():

Or if you'd captured the object returned by the tie, you could simply use that reference instead of the tied function, as in the following sample code.

$myobj = tie $my_speed, 'Nice', $$;
$speed = $my_speed;       # through the implicit interface
$speed = $myobj->FETCH(); # same thing, explicitly

You can use $myobj to call methods other than the implicit ones, such as those provided by the DB_File class (see Chapter 7). However, one normally minds one's own business and leaves the underlying object alone, which is why you often see the return value from tie ignored. You can still get at it if you need it later.

That's the external view of it. For our implementation, we'll use the BSD::Resource class (found in CPAN, but not included with Perl) to access the PRIO_PROCESS, PRIO_MIN, and PRIO_MAX constants from your system. Here's the preamble of our class, which we will put into a file named Nice.pm:

package Nice;
use Carp;            # Propagates error messages nicely.
use BSD::Resource;   # Use these hooks into the OS.
use strict;          # Enforce some discipline on ourselves,
use vars '$DEBUG';   # but exempt $DEBUG from discipline.

The Carp module provides methods carp(), croak(), and confess(), which we'll use in various spots below. As usual, see Chapter 7 for more about Carp.

The use strict would ordinarily disallow the use of unqualified package variables like $DEBUG, but we then declared the global with use vars, so it's exempt. Otherwise we'd have to say $Nice::DEBUG everywhere. But it is a global, and other modules can turn on debugging in our module by setting $Nice::DEBUG to some other value before using our module.

TIESCALAR CLASSNAME, LIST

The TIESCALAR method of the class (that is, the class package, but we're going to stop reminding you of that) is implicitly invoked whenever tie is called on a scalar variable. The LIST contains any optional parameters needed to properly initialize an object of the given class. (In our example, there is only one parameter, the process ID.) The method is expected to return an object, which may or may not contain an anonymous scalar as its blessed thingy. In our example, it does.

sub TIESCALAR {
    my $class = shift;
    my $pid   = shift;

    $pid ||= $$;              # arg of 0 defaults to my process

    if ($pid =~ /\D/) {
        carp "Nice::TIESCALAR got non-numeric pid $pid" if $^W;
        return undef;
    }

    unless (kill 0, $pid) {   # EPERM or ERSCH, no doubt
        carp "Nice::TIESCALAR got bad pid $pid: $!" if $^W;
        return undef;
    }

    return bless \$pid, $class;
}

Recall that the statement with the ||= operator is just shorthand for

$pid = $pid || $$;      # set if not set

We say the object contains an anonymous scalar, but it doesn't really become anonymous until my $pid goes out of scope, since that's the variable we're generating a reference to when we bestow the blessing. When returning a reference to an array or hash, one could use the same approach by employing a lexically scoped array or hash variable, but usually people just use the anonymous array or hash composers, [] and {}. There is no similar composer for anonymous scalars.

On the subject of subterfuge, the kill isn't really killing the process. On most UNIX systems, a signal 0 merely checks to see whether the process is there.

This particular tie class has chosen to return an error value rather than raise an exception if its constructor fails. Other classes may not wish to be so forgiving. (In any event, the tie itself will throw an exception when the constructor fails to return an object. But you get more error messages this way, which many folks seem to prefer.) This routine checks the global variable $^W (which reflects Perl's -w flag) to see whether to emit its extra bit of noise.

But for all that, it's an ordinary constructor, and doesn't know it's being called from tie. It just suspects it strongly.

FETCH THIS

This method is triggered every time the tied variable is accessed (that is, read). It takes no arguments beyond a reference to the object that is tied to the variable. (The FETCH methods for arrays and hashes do, though.) Since in this case we're just using a scalar thingy as the tied object, a simple scalar dereference, $$self, allows the method to get at the real value stored in its object. In the example below, that real value is the process ID to which we've tied our variable.

sub FETCH {
    my $self = shift;       # ref to scalar

    confess "wrong type" unless ref $self;
    croak "too many arguments" if @_;

    my $nicety;
    local $! = 0;           # preserve errno
    $nicety = getpriority(PRIO_PROCESS, $$self);
    if ($!) { croak "getpriority failed: $!" }
    return $nicety;
}

This time we've decided to blow up (raise an exception) if the getpriority function fails - there's no place for us to return an error otherwise, and it's probably the right thing to do.

Note the absence of a $ on PRIO_PROCESS. That's really a subroutine call into BSD::Resource that returns the appropriate constant to feed back into getpriority. The PRIO_PROCESS declaration was imported by the use declaration. And that's why there's no $ on the front of it - it's not a variable. (If you had put a $, the use strict would have caught it for you as an unqualified global.)

STORE THIS, VALUE

This method is triggered every time the tied variable is set (assigned). The first argument, THIS, is again a reference to the object associated with the variable, and VALUE is the value the user is assigning to the variable.

sub STORE {
    my $self = shift;
    my $new_nicety = shift;

    confess "wrong type" unless ref $self;
    croak "too many arguments" if @_;

    if ($new_nicety < PRIO_MIN) {
        carp sprintf
          "WARNING: priority %d less than minimum system priority %d",
              $new_nicety, PRIO_MIN if $^W;
        $new_nicety = PRIO_MIN;
    }

    if ($new_nicety > PRIO_MAX) {
        carp sprintf
          "WARNING: priority %d greater than maximum system priority %d",
              $new_nicety, PRIO_MAX if $^W;
        $new_nicety = PRIO_MAX;
    }

    unless (defined setpriority(PRIO_PROCESS, $$self, $new_nicety)) {
        confess "setpriority failed: $!";
    }
    return $new_nicety;
}

There doesn't appear to be anything worth explaining there, except maybe that we return the new value because that's what an assignment returns.

DESTROY THIS

This method is triggered when the object associated with the tied variable needs to be destructed (usually only when it goes out of scope). As with other object classes, such a method is seldom necessary, since Perl deallocates the moribund object's memory for you automatically. Here, we'll use a DESTROY method for debugging purposes only.

sub DESTROY {
    my $self = shift;
    confess "wrong type" unless ref $self;
    carp "[ Nice::DESTROY pid $$self ]" if $DEBUG;
}

That's about all there is to it. Actually, it's more than all there is to it, since we've done a few nice things here for the sake of completeness, robustness, and general aesthetics (or lack thereof). Simpler TIESCALAR classes are certainly possible.

5.4.2 Tying Arrays

A class implementing a tied ordinary array must define the following methods: TIEARRAY, FETCH, STORE, and perhaps DESTROY.

Tied arrays are incomplete. There are, as yet, no defined methods to deal with $#ARRAY access (which is hard, since it's an lvalue), nor with the other obvious array functions, like push, pop, shift, unshift, and splice. This means that a tied array doesn't behave like an untied one. You can't even determine the length of the array. But if you use the tied arrays only for simple read and write access you'll be OK. These restrictions will be removed in a future release.

For the purpose of this discussion, we will implement an array whose indices are fixed at its creation. If you try to access anything beyond those bounds, you will cause an exception.

require Bounded_Array;
tie @ary, 'Bounded_Array', 2;   # maximum allowable subscript is 2
$| = 1;
for $i (0 .. 10) {
    print "setting index $i: ";
    $ary[$i] = 10 * $i;         # should raise exception on 3
    print "value of element $i now $ary[$i]\n";
}

The preamble code for the class is as follows:

package Bounded_Array;
use Carp;
use strict;

TIEARRAY CLASSNAME, LIST

This is the constructor for the class. That means it is expected to return a blessed reference through which the new array (probably an anonymous array reference) will be accessed.

In our example, just to demonstrate that you don't really have to use an array thingy, we'll choose a hash thingy to represent our object. A hash works out well as a generic record type: the {BOUND} field will store the maximum bound allowed, and the {ARRAY} field will hold the true array reference. Anyone outside the class who tries to dereference the object returned (doubtless thinking it an array reference), will blow up. This just goes to show that you should respect an object's privacy (unless you're well acquainted and committed to maintaining a good relationship for the rest of your life).

sub TIEARRAY {
    my $class = shift;
    my $bound = shift;

    confess "usage: tie(\@ary, 'Bounded_Array', max_subscript)"
        if @_ or $bound =~ /\D/;

    return bless {
        BOUND => $bound,
        ARRAY => [],
    }, $class;
}

In this case we have used the anonymous hash composer rather than a lexically scoped variable that goes out of scope. We also used the array composer within the hash composer.

FETCH THIS, INDEX

This method will be triggered every time an individual element in the tied array is accessed (read). It takes one argument beyond its self reference: the index we're trying to fetch. (The index is an integer, but just because the caller thinks of it as a mundane integer doesn't mean you have to do anything "linear" with it. You could use it to seed a random number generator, for instance, or process it with a hash function to do a random lookup in a hash table.)

Here we use list assignment rather than shift to process the method arguments. TMTOWTDI.

sub FETCH {
    my ($self, $idx) = @_;
    if ($idx > $self->{BOUND}) {
        confess "Array OOB: $idx > $self->{BOUND}";
    }
    return $self->{ARRAY}[$idx];
}

As you may have noticed, the names of the FETCH, STORE, and DESTROY methods are the same for all tied classes, even though the constructors differ in name (TIESCALAR versus TIEARRAY). While in theory you could have the same class servicing several tied types, in practice this becomes cumbersome, and it's easiest to simply write them with one type per class.

STORE THIS, INDEX, VALUE

This method will be triggered every time an element in the tied array is set (written). It takes two arguments beyond its self reference: the index at which we're trying to store something and the value we're trying to put there. For example:

sub STORE {
    my ($self, $idx, $value) = @_;
    if ($idx > $self->{BOUND} ) {
        confess "Array OOB: $idx > $self->{BOUND}";
    }
    return $self->{ARRAY}[$idx] = $value;
}

DESTROY THIS

This method will be triggered when the tied object needs to be deallocated. As with the scalar tie class, this is almost never needed in a language that does its own storage allocation, so this time we'll just leave it out.

The code we presented at the beginning of this section attempts several out-of-bounds accesses. It will therefore generate the following output:

setting index 0: value of element 0 now 0
setting index 1: value of element 1 now 10
setting index 2: value of element 2 now 20
setting index 3: Array OOB: 3 > 2 at Bounded_Array.pm line 39
        Bounded_Array::FETCH called at testba line 12

5.4.3 Tying Hashes

For historical reasons, hashes have the most complete and useful tie implementation. A class implementing a tied associative array must define various methods. TIEHASH is the constructor. FETCH and STORE access the key/value pairs. EXISTS reports whether a key is present in the hash, and DELETE deletes one. CLEAR empties the hash by deleting all the key/value pairs. FIRSTKEY and NEXTKEY implement the keys and each built-in functions to iterate over all the keys. And DESTROY (if defined) is called when the tied object is deallocated.

If this seems like a lot, then feel free to inherit most of these methods from the standard Tie::Hash module, redefining only the interesting ones. See the Tie::Hash module documentation in Chapter 7 for details.

Remember that Perl distinguishes a key not existing in the hash from a key that exists with an undefined value. The two possibilities can be tested with the exists and defined functions, respectively.

Because functions like keys and values may return huge array values when used on large hashes (like tied DBM files), you may prefer to use the each function to iterate over such. For example:

# print out B-news history file offsets
use NDBM_File;
tie(%HIST, 'NDBM_File', '/usr/lib/news/history', 1, 0);
while (($key,$val) = each %HIST) {
    print $key, ' = ', unpack('L',$val), "\n";
}
untie(%HIST);

(But does anyone run B-news any more?)

Here's an example of a somewhat peculiar tied hash class: it gives you a hash representing a particular user's dotfiles (that is, files whose names begin with a period). You index into the hash with the name of the file (minus the period) and you get back that dotfile's contents. For example:

use DotFiles;
tie %dot, "DotFiles";
if ( $dot{profile} =~ /MANPATH/ or
     $dot{login}   =~ /MANPATH/ or
     $dot{cshrc}   =~ /MANPATH/    )
{
    print "you've set your manpath\n";
}

Here's another way to use our tied class:

# third argument is name of user whose dot files we will tie to
tie %him, 'DotFiles', 'daemon';
foreach $f ( keys %him ) {
    printf "daemon dot file %s is size %d\n",
        $f, length $him{$f};
}

In our DotFiles example we implement the object as a regular hash containing several important fields, of which only the {CONTENTS} field will be what the user thinks of as the real hash. Here are the fields:

USER: Whose dot files this object represents
HOME: Where those dotfiles live
CLOBBER: Whether we are allowed to change or remove those dot files
CONTENTS: The hash of dotfile names and content mappings

Here's the start of DotFiles.pm:

package DotFiles;
use Carp;
sub whowasi { (caller(1))[3] . '()' }
my $DEBUG = 0;
sub debug { $DEBUG = @_ ? shift : 1 }

For our example, we want to be able to emit debugging information to help in tracing during development. We also keep one convenience function around internally to help print out warnings; whowasi() returns the name of the function that called the current function (whowasi()'s "grandparent" function).

Here are the methods for the DotFiles tied hash.

TIEHASH CLASSNAME, LIST

This is the constructor for the class. That means it is expected to return a blessed reference through which the new object may be accessed. Again, the user of the tied class probably has little need of the object. It's Perl itself that needs the returned object so that it can magically call the right methods when the tied variable is accessed.

Here's the constructor:

sub TIEHASH {
    my $self = shift;
    my $user = shift || $>;
    my $dotdir = shift || "";

    croak "usage: @{[&whowasi]} [USER [DOTDIR]]" if @_;

    $user = getpwuid($user) if $user =~ /^\d+$/;
    my $dir = (getpwnam($user))[7]
            or croak "@{[&whowasi]}: no user $user";
    $dir .= "/$dotdir" if $dotdir;

    my $node = {
        USER        => $user,
        HOME        => $dir,
        CONTENTS    => {},
        CLOBBER     => 0,
    };

    opendir DIR, $dir
            or croak "@{[&whowasi]}: can't opendir $dir: $!";
    foreach $dot ( grep /^\./ && -f "$dir/$_", readdir(DIR)) {
        $dot =~ s/^\.//;
        $node->{CONTENTS}{$dot} = undef;
    }
    closedir DIR;

    return bless $node, $self;
}

It's probably worth mentioning that if you're going to filetest the return values returned by that readdir, you'd better prepend the directory in question (as we do). Otherwise, since no chdir was done, you'd test the wrong file.

FETCH THIS, KEY

This method will be triggered every time an element in the tied hash is accessed (read). It takes one argument beyond its self reference: the key whose value we're trying to fetch. The key is a string, and you can do anything you like with it (consistent with its being a string).

Here's the fetch for our DotFiles example.

sub FETCH {
    carp &whowasi if $DEBUG;
    my $self = shift;
    my $dot = shift;
    my $dir = $self->{HOME};
    my $file = "$dir/.$dot";

    unless (exists $self->{CONTENTS}->{$dot} || -f $file) {
        carp "@{[&whowasi]}: no $dot file" if $DEBUG;
        return undef;
    }

    # Implement a cache.
    if (defined $self->{CONTENTS}->{$dot}) {
        return $self->{CONTENTS}->{$dot};
    } else {
        return $self->{CONTENTS}->{$dot} = `cat $dir/.$dot`;
    }
}

This function was easy to write by having it call the UNIX cat(1) command, but it would be more portable (and somewhat more efficient) to open the file ourselves. On the other hand, since dot files are a UNIXy concept, we're not that concerned.

STORE THIS, KEY, VALUE

This method will be triggered every time an element in the tied hash is set (written). It takes two arguments beyond its self reference: the key under which we're storing the value and the value we're putting there.

Here in our DotFiles example we won't let users overwrite a file without first calling the clobber() method on the original object reference returned by tie.

sub STORE {
    carp &whowasi if $DEBUG;
    my $self = shift;
    my $dot = shift;
    my $value = shift;
    my $file = $self->{HOME} . "/.$dot";

    croak "@{[&whowasi]}: $file not clobberable"
        unless $self->{CLOBBER};

    open(F, "> $file") or croak "can't open $file: $!";
    print F $value;
    close(F);
}

If they want to clobber something, they can say:

$ob = tie %daemon_dots, 'daemon';
$ob->clobber(1);
$daemon_dots{signature} = "A true daemon\n";

But there's also the tied function, so they could alternatively set clobber using:

tie %daemon_dots, 'daemon';
tied(%daemon_dots)->clobber(1);

The clobber method is simply:

sub clobber {
    my $self = shift;
    $self->{CLOBBER} = @_ ? shift : 1;
}

DELETE THIS, KEY

This method is triggered when we remove an element from the hash, typically by using the delete function. Again, we'll be careful to check whether the user really wants to clobber files.

sub DELETE   {
    carp &whowasi if $DEBUG;

    my $self = shift;
    my $dot = shift;
    my $file = $self->{HOME} . "/.$dot";
    croak "@{[&whowasi]}: won't remove file $file"
        unless $self->{CLOBBER};
    delete $self->{CONTENTS}->{$dot};
    unlink $file or carp "@{[&whowasi]}: can't unlink $file: $!";
}

CLEAR THIS

This method is triggered when the whole hash is to be cleared, usually by assigning the empty list to it.

In our example, that would remove all the user's dotfiles! It's such a dangerous thing that we'll require CLOBBER to be set higher than 1 before this can happen.

sub CLEAR {
    carp &whowasi if $DEBUG;
    my $self = shift;
    croak "@{[&whowasi]}: won't remove all dotfiles for $self->{USER}"
        unless $self->{CLOBBER} > 1;
    my $dot;
    foreach $dot ( keys %{$self->{CONTENTS}}) {
        $self->DELETE($dot);
    }
}

EXISTS THIS, KEY

This method is triggered when the user invokes the exists function on a particular hash. In our example, we'll look at the {CONTENTS} hash element to find the answer:

sub EXISTS   {
    carp &whowasi if $DEBUG;
    my $self = shift;
    my $dot = shift;
    return exists $self->{CONTENTS}->{$dot};
}

FIRSTKEY THIS

This method is triggered when the user begins to iterate through the hash, such as with a keys or each call. By calling keys in a scalar context, we reset its internal state to ensure that the next each used in the return statement will get the first key.

sub FIRSTKEY {
    carp &whowasi if $DEBUG;
    my $self = shift;
    my $a    = keys %{$self->{CONTENTS}};
    return scalar each %{$self->{CONTENTS}};
}

NEXTKEY THIS, LASTKEY

This method is triggered during a keys or each iteration. It has a second argument which is the last key that has been accessed. This is useful if the NEXTKEY method needs to know its previous state to calculate the next state.

For our example, we are using a real hash to represent the tied hash's data, except that this hash is stored in the hash's CONTENTS field instead of in the hash itself. So we can just rely on Perl's each iterator:

sub NEXTKEY  {
    carp &whowasi if $DEBUG;
    my $self = shift;
    return scalar each %{ $self->{CONTENTS} }
}

DESTROY THIS

This method is triggered when a tied hash's object is about to be deallocated. You don't really need it except for debugging and extra cleanup. Here's a very simple function:

sub DESTROY  {
    carp &whowasi if $DEBUG;
}