[Chapter 4] 4.3 Closures

4.3 Closures

Instead of returning data, a Perl subroutine can return a reference to a subroutine. This is really no different from any other ways of passing subroutine references around, except for a somewhat hidden feature involving anonymous subroutines and lexical (my) variables. Consider

$greeting = "hello world";
$rs = sub {
    print $greeting;
};
&$rs();  #prints "hello world"

In this example, the anonymous subroutine makes use of the global variable $greeting. No surprises here, right? Now, let's modify this innocuous example slightly:

sub generate_greeting {
    my($greeting) = "hello world";
    return sub {print $greeting};
}
$rs = generate_greeting();
&$rs(); # Prints "hello world"

The generate_greeting subroutine returns the reference to an anonymous subroutine, which in turn prints $greeting. The curious thing is that $greeting is a my variable that belongs to generate_greeting. Once generate_greeting finishes executing, you would expect all its local variables to be destroyed. But when you invoke the anonymous subroutine later on, using &$rs(), it manages to still print $greeting. How does it work?

Any other expression in place of the anonymous subroutine definition would have used $greeting right away. A subroutine block, on the other hand, is a package of code to be invoked at a later time, so it keeps track of all the variables it is going to need later on (taking them "to go," in a manner of speaking). When this subroutine is called subsequently and invokes print "$greeting", the subroutine remembers the value that $greeting had when that subroutine was created.

Let's modify this a bit more to really understand what this idiom is capable of:

sub generate_greeting {
    my($greeting) = @_;     # $greeting primed by arguments
    return sub {
                 my($subject)= @_;
                 print "$greeting $subject \n";
           };
}
$rs1 = generate_greeting("hello");
$rs2 = generate_greeting("my fair");

# $rs1 and $rs2 are two subroutines holding on to different $greeting's
&$rs1 ("world") ;  # prints "hello world"
&$rs2 ("lady") ;   # prints "my fair lady"

Instead of hardcoding $greeting, we get it from generate_greeting's arguments. When generate_greeting is called the first time, the anonymous subroutine that it returns holds onto $greeting's value. Hence the subroutine referred to by $rs1 behaves somewhat like this:

$rs1 = sub { 
    my ($subject) = @_;
    my $greeting = "hello";
    print "$greeting $subject\n";   # $greeting's value is "hello"
}

The subroutine is known as a closure (the term comes from the LISP world). As you can see, it captures $greeting's value, and when it is invoked later on, it needs only one parameter.

Like some immigrants to a country who retain the culture and customs of the place in which they are born, closures are subroutines that package all the variables they need from the scope in which they are created.

As it happens, Perl creates closures only over lexical (my) variables and not over global or localized (tagged with local) variables. Let's take a peek under the covers to understand why this is so.

4.3.1 Closures, Behind the Scenes

If you are not interested in the details of how closures work, you can safely go on to the next section without loss of continuity.

Recall that the name of a variable and its value are separate entities. When it first sees $greeting, Perl binds the name "greeting" to a freshly allocated scalar value, setting the value's reference count to 1 (there's now an arrow pointing to the value). At the end of the block, Perl disassociates the name from the scalar value and decrements the value's reference count. In a typical block where you don't squirrel away references to that value, the value would be deallocated, since the reference count comes down to zero. In this example, however, the anonymous subroutine happens to use $greeting, so it increments that scalar value's reference count, thus preventing its automatic deallocation when generate_greeting finishes. When generate_greeting is called a second time, the name "greeting" is bound to a whole new scalar value, and so the second closure gets to hang on to its own scalar value.

Why don't closures work with local variables? Recall from Chapter 3, Typeglobs and Symbol Tables, that variables marked local are dynamically scoped (or "temporarily global"). A local variable's value depends on the call stack at the moment at which it is used. For this reason, if $greeting were declared local, Perl would look up its value when the anonymous subroutine is called (actually when print is called inside it), not when it is defined. You can verify this with a simple test:

sub generate_greeting {
    local ($greeting) = @_;
    return sub {
       print "$greeting \n" ;
    }
}
$rs = generate_greeting("hello");
$greeting = "Goodbye";
&$rs();      # Prints "Goodbye", not "hello"

The anonymous subroutine is not a closure in this case, because it doesn't hang onto the local value of $greeting ("hello") at the time of its creation. Once generate_greeting has finished executing, $greeting is back to its old global value, which is what is seen by the anonymous subroutine while executing.

It might appear that every time generate_greeting returns an anonymous subroutine, it creates a whole new packet of code internally. That isn't so. The code for the anonymous subroutine is generated once during compile time. $rs is internally a reference to a "code value," which in turn keeps track not only of the byte-codes themselves (which it shares with all other subroutine references pointing to the same piece of code), but also all the variables it requires from its environment (each subroutine reference packs its own private context for later use). Chapter 20 does less hand-waving and supplies exact details.

To summarize, a closure is the special case of an anonymous subroutine holding onto data that used to belong to its scope at the time of its creation.