If the user supplies no arguments, Perl sets @ARGV
to a single string, "-"
. This is shorthand for STDIN when opened for reading and STDOUT when opened for writing. It's also what lets the user of your program specify "-"
as a filename on the command line to read from STDIN.
Next, the file processing loop removes one argument at a time from @ARGV
and copies the filename into the global variable $ARGV
. If the file cannot be opened, Perl goes on to the next one. Otherwise, it processes a line at a time. When the file runs out, the loop goes back and opens the next one, repeating the process until @ARGV
is exhausted.
The open
statement didn't say open(ARGV,
"<
$ARGV")
. There's no extra greater- than symbol supplied. This allows for interesting effects, like passing the string "gzip
-dc
file.gz
|"
as an argument, to make your program read the output of the command "gzip
-dc
file.gz"
. See Recipe 16.6 for more about this use of magic open.
You can change @ARGV
before or inside the loop. Let's say you don't want the default behavior of reading from STDIN if there aren't any arguments - you want it to default to all the C or C++ source and header files. Insert this line before you start processing <ARGV>
:
@ARGV = glob("*.[Cch]") unless @ARGV;
Process options before the loop, either with one of the Getopt libraries described in Chapter 15, User Interfaces, or manually:
# arg demo 1: Process optional -c flag
if (@ARGV && $ARGV[0] eq '-c') {
$chop_first++;
shift;
}
# arg demo 2: Process optional -NUMBER flag
if (@ARGV && $ARGV[0] =~ /^-(\d+)$/) {
$columns = $1;
shift;
}
# arg demo 3: Process clustering -a, -i, -n, or -u flags
while (@ARGV && $ARGV[0] =~ /^-(.+)/ && (shift, ($_ = $1), 1)) {
next if /^$/;
s/a// && (++$append, redo);
s/i// && (++$ignore_ints, redo);
s/n// && (++$nostdout, redo);
s/u// && (++$unbuffer, redo);
die "usage: $0 [-ainu] [filenames] ...\n";
}
Other than its implicit looping over command-line arguments, <>
is not special. The special variables controlling I/O still apply; see Chapter 8 for more on them. You can set $/
to set the line terminator, and $.
contains the current line (record) number. If you undefine $/
, you don't get the concatenated contents of all files at once; you get one complete file each time:
undef $/;
while (<>) {
# $_ now has the complete contents of
# the file whose name is in $ARGV
}
If you localize $/
, the old value is automatically restored when the enclosing block exits:
{ # create block for local
local $/; # record separator now undef
while (<>) {
# do something; called functions still have
# undeffed version of $/
}
} # $/ restored here
Because processing <ARGV>
never explicitly closes filehandles, the record number in $.
is not reset. If you don't like that, you can explicitly close the file yourself to reset $.
:
while (<>) {
print "$ARGV:$.:$_";
close ARGV if eof;
}
The
eof
function
defaults to checking the end of file status of the last file read. Since the last handle read was ARGV, eof
reports whether we're at the end of the current file. If so, we close it and reset the $.
variable. On the other hand, the special notation eof()
with parentheses but no argument checks if we've reached the end of all files in the <ARGV>
processing.
Perl has command-line options, -n, -p, and -i, to make writing filters and one-liners easier.
The -n option adds the while
(<>)
loop around your program text. It's normally used for filters like grep or programs that summarize the data they read. The program is shown in Example 7.1.
#!/usr/bin/perl
# findlogin1 - print all lines containing the string "login"
while (<>) {# loop over files on command line
print if /login/;
}
The program in Example 7.1 could be written as shown in Example 7.2.
#!/usr/bin/perl -n
# findlogin2 - print all lines containing the string "login"
print if /login/;
You can combine the -n and -e options to run Perl code from the command line:
% perl -ne 'print if /login/'
The -p option is like -n but it adds a print
at the end of the loop. It's normally used for programs that translate their input. This program is shown in Example 7.3.
#!/usr/bin/perl
# lowercase - turn all lines into lowercase
use locale;
while (<>) { # loop over lines on command line
s/([^\W0-9_])/\l$1/g; # change all letters to lowercase
print;
}
The program in Example 7.3 could be written as shown in Example 7.4.
#!/usr/bin/perl -p
# lowercase - turn all lines into lowercase
use locale;
s/([^\W0-9_])/\l$1/g;# change all letters to lowercase
Or written from the command line as:
% perl -Mlocale -pe 's/([^\W0-9_])/\l$1/g'
While using -n or -p for implicit input looping, the special label LINE:
is silently created for the whole input loop. That means that from an inner loop, you can go on the following input record by using next
LINE
(this is like awk 's next
). Go on to the file by closing ARGV (this is like awk 's nextfile
). This is shown in Example 7.5.
#!/usr/bin/perl -n
# countchunks - count how many words are used.
# skip comments, and bail on file if __END__
# or __DATA__ seen.
for (split /\W+/) {
next LINE if /^#/;
close ARGV if /__(DATA|END)__/;
$chunks++;
}
END { print "Found $chunks chunks\n" }
The tcsh keeps a .history file in a format such that every other line contains a commented out timestamp in Epoch seconds:
#+0894382237
less /etc/motd
#+0894382239
vi ~/.exrc
#+0894382242
date
#+0894382242
who
#+0894382288
telnet home
A simple one-liner can render that legible:
% perl -pe 's/^#\+(\d+)\n/localtime($1) . " "/e'
Tue May 5 09:30:37 1998 less /etc/motd
Tue May 5 09:30:39 1998 vi ~/.exrc
Tue May 5 09:30:42 1998 date
Tue May 5 09:30:42 1998 who
Tue May 5 09:31:28 1998 telnet home
The -i option changes each file on the command line. It is described in Recipe 7.9, and is normally used in conjunction with -p.
You have to say use
locale
to handle current character set.