[Chapter 3] 3.2.102 open

3.2.102 open

open FILEHANDLE, EXPR
open FILEHANDLE

This function opens the file whose filename is given by EXPR, and associates it with FILEHANDLE. If EXPR is omitted, the scalar variable of the same name as the FILEHANDLE must contain the filename. (And you must also be careful to use "or die" after the statement rather than "|| die", because the precedence of || is higher than list operators like open.) FILEHANDLE may be a directly specified filehandle name, or an expression whose value will be used for the filehandle. The latter is called an indirect filehandle. If you supply an undefined variable for the indirect filehandle, Perl will not automatically fill it in for you - you have to make sure the expression returns something unique, either a string specifying the actual filehandle name, or a filehandle object from one of the object-oriented I/O packages. (A filehandle object is unique because you call a constructor to generate the object. See the example later in this section.)

After the filehandle is determined, the filename string is processed. First, any leading and trailing whitespace is removed from the string. Then the string is examined on both ends for characters specifying how the file is to be opened. (By an amazing coincidence, these characters look just like the characters you'd use to indicate I/O redirection to the Bourne shell.) If the filename begins with < or nothing, the file is opened for input. If the filename begins with >, the file is truncated and opened for output. If the filename begins with >>, the file is opened for appending. (You can also put a + in front of the > or < to indicate that you want both read and write access to the file.) If the filename begins with |, the filename is interpreted as a command to which output is to be piped, and if the filename ends with a |, the filename is interpreted as command which pipes input to us. You may not have an open command that pipes both in and out, although the IPC::Open2 and IPC::Open3 library routines give you a close equivalent. See the section "Bidirectional Communication" in Chapter 6.

Any pipe command containing shell metacharacters is passed to /bin/sh for execution; otherwise it is executed directly by Perl. The filename "-" refers to STDIN, and ">-" refers to STDOUT. open returns non-zero upon success, the undefined value otherwise. If the open involved a pipe, the return value happens to be the process ID of the subprocess.

If you're unfortunate enough to be running Perl on a system that distinguishes between text files and binary files (modern operating systems don't care), then you should check out binmode for tips for dealing with this. The key distinction between systems that need binmode and those that don't is their text file formats. Systems like UNIX and Plan9 that delimit lines with a single character, and that encode that character in C as '\n', do not need binmode. The rest need it.

Here is some code that shows the relatedness of a filehandle and a variable of the same name:

$ARTICLE = "/usr/spool/news/comp/lang/perl/misc/38245";
open ARTICLE or die "Can't find article $ARTICLE: $!\n";
while (<ARTICLE>) {...

Append to a file like this:

open LOG, '>>/usr/spool/news/twitlog'; # (`log' is reserved)

Pipe your data from a process:

open ARTICLE, "caesar <$article |";   # decrypt article with rot13

Here < does not indicate that Perl should open the file for input, because < is not the first character of EXPR. Rather, the concluding | indicates that input is to be piped from caesar <$article (from the program caesar, which takes $article as its standard input). The < is interpreted by the subshell that Perl uses to start the pipe, because < is a shell metacharacter.

Or pipe your data to a process:

open EXTRACT, "|sort >/tmp/Tmp$$";    # $$ is our process number

In this next example we show one way to do recursive opens, via indirect filehandles. The files will be opened on filehandles fh01, fh02, fh03, and so on. Because $input is a local variable, it is preserved through recursion, allowing us to close the correct file before we return.

# Process argument list of files along with any includes.

foreach $file (@ARGV) {
    process($file, 'fh00');
}

sub process {
    local($filename, $input) = @_;
    $input++;               # this is a string increment
    unless (open $input, $filename) {
        print STDERR "Can't open $filename: $!\n";
        return;
    }
    while (<$input>) {      # note the use of indirection
        if (/^#include "(.*)"/) {
            process($1, $input);
            next;
        }
        ...               # whatever
    }
    close $input;
}

You may also, in the Bourne shell tradition, specify an EXPR beginning with >&, in which case the rest of the string is interpreted as the name of a filehandle (or file descriptor, if numeric) which is to be duped and opened.[6] You may use & after >, >>, <, +>, +>>, and +<. The mode you specify should match the mode of the original filehandle. Here is a script that saves, redirects, and restores STDOUT and STDERR:

[6] The word "dup" is UNIX-speak for "duplicate". We're not really trying to dupe you. Trust us.

#!/usr/bin/perl
open SAVEOUT, ">&STDOUT";
open SAVEERR, ">&STDERR";

open STDOUT, ">foo.out" or die "Can't redirect stdout";
open STDERR, ">&STDOUT" or die "Can't dup stdout";

select STDERR; $| = 1;         # make unbuffered
select STDOUT; $| = 1;         # make unbuffered

print STDOUT "stdout 1\n";     # this propagates to
print STDERR "stderr 1\n";     # subprocesses too

close STDOUT;
close STDERR;

open STDOUT, ">&SAVEOUT";
open STDERR, ">&SAVEERR";

print STDOUT "stdout 2\n";
print STDERR "stderr 2\n";

If you specify <&=N, where N is a number, then Perl will do an equivalent of C's fdopen(3) of that file descriptor; this is more parsimonious with file descriptors than the dup form described earlier. (On the other hand, it's more dangerous, since two filehandles may now be sharing the same file descriptor, and a close on one filehandle may prematurely close the other.) For example:

open FILEHANDLE, "<&=$fd";

If you open a pipe to or from the command "-" (that is, either |- or -|), then an implicit fork is done, and the return value of open is the pid of the child within the parent process, and 0 within the child process. (Use defined($pid) in either the parent or child to determine whether the open was successful.) The filehandle behaves normally for the parent, but input and output to that filehandle is piped from or to the STDOUT or STDIN of the child process. In the child process the filehandle isn't opened - I/O happens from or to the new STDIN or STDOUT. Typically this is used like the normal piped open when you want to exercise more control over just how the pipe command gets executed, such as when you are running setuid, and don't want to have to scan shell commands for metacharacters. The following pairs are equivalent:

open FOO, "|tr '[a-z]' '[A-Z]'";
open FOO, "|-" or exec 'tr', '[a-z]', '[A-Z]';

open FOO, "cat -n file|";
open FOO, "-|" or exec 'cat', '-n', 'file';

Explicitly closing any piped filehandle causes the parent process to wait for the child to finish, and returns the status value in $?. On any operation which may do a fork, unflushed buffers remain unflushed in both processes, which means you may need to set $| on one or more filehandles to avoid duplicate output (and then do output to flush them).

Filehandles STDIN, STDOUT, and STDERR remain open following an exec. Other filehandles do not. (However, on systems supporting the fcntl function, you may modify the close-on-exec flag for a filehandle. See fcntl earlier in this chapter. See also the special $^F variable.)

Using the constructor from the FileHandle module, described in Chapter 7, you can generate anonymous filehandles which have the scope of whatever variables hold references to them, and automatically close whenever and however you leave that scope:

use FileHandle;
...
sub read_myfile_munged {
    my $ALL = shift;
    my $handle = new FileHandle;
    open $handle, "myfile" or die "myfile: $!";
    $first = <$handle> or return ();      # Automatically closed here.
    mung $first or die "mung failed";     # Or here.
    return $first, <$handle> if $ALL;     # Or here.
    $first;                               # Or here.
}

In order to open a file with arbitrary weird characters in it, it's necessary to protect any leading and trailing whitespace, like this:

$file =~ s#^(\s)#./$1#;
open (FOO, "< $file\0");

But we've never actually seen anyone use that in a script...

If you want a real C open(2), then you should use the sysopen function. This is another way to protect your filenames from interpretation. For example:

use FileHandle;
sysopen HANDLE, $path, O_RDWR|O_CREAT|O_EXCL, 0700
    or die "sysopen $path: $!";
HANDLE->autoflush(1);
HANDLE->print("stuff $$\n");
seek HANDLE, 0, 0;
print "File contains: ", <HANDLE>;

See seek for some details about mixing reading and writing.