Use hashes instead of linear searches.
For example, instead of searching through @keywords
to see if
$_ is a keyword, construct a hash with:
my %keywords;
for (@keywords) {
$keywords{$_}++;
}
Then, you can quickly tell if $_ contains a keyword by testing
$keyword{$_}
for a non-zero value.
Avoid subscripting when a foreach or list operator will do. Subscripting
sometimes forces conversion from floating point to integer, and
there's often a better way to do it. Consider using foreach, shift,
and splice operations. Consider saying
use integer
.
Avoid goto.
It scans outward from your current location for the indicated label.
Avoid printf if print will work.
Quite apart from the extra overhead of printf, some
implementations have field length limitations that print gets
around.
Avoid $&, $`
,
and $'
.
Any occurrence in your program causes all matches to save the searched
string for possible future reference. (However, once you've blown it, it
doesn't hurt to have more of them.)
Avoid using eval on a string. An eval of a string (not of a
BLOCK
) forces recompilation every time through. The
Perl parser is pretty fast for a parser, but that's not saying much. Nowadays
there's almost always a better way to do what you want anyway. In particular,
any code that uses eval merely to construct
variable names is obsolete, since you can now do the same directly using
symbolic references:
${$pkg . '::' . $varname} = &{ "fix_" . $varname }($pkg);
Avoid string eval inside a loop.
Put the loop into the eval instead, to avoid redundant
recompilations of the code. See the study operator
in Chapter 3 for an example of this.
Avoid run-time-compiled patterns. Use the
/
pattern
/o
(once only) pattern modifier to avoid pattern recompilation when the
pattern doesn't change over the life of the process.
For patterns that change
occasionally, you can use the fact that a null pattern refers back to
the previous pattern, like this:
"foundstring" =~ /$currentpattern/; # Dummy match (must succeed).
while (<>) {
print if //;
}
You can also use eval to recompile a subroutine that does the match (if
you only recompile occasionally).
Short-circuit alternation is often faster than the corresponding
regular expression. So:
print if /one-hump/ || /two/;
is likely to be faster than:
print if /one-hump|two/;
at least for certain values of one-hump and two.
This is because the optimizer likes to hoist certain simple matching
operations up into higher parts of the syntax tree and do very fast
matching with a Boyer-Moore algorithm. A complicated pattern defeats
this.
Reject common cases early with next if
.
As with simple regular expressions, the optimizer likes this. And it just
makes sense to avoid unnecessary work. You can typically discard comment
lines and blank lines even before you do a split or chop:
while (<>) {
next if /^#/;
next if /^$/;
chop;
@piggies = split(/,/);
...
}
Avoid regular expressions with many quantifiers, or with big
{
m,n
}
numbers on parenthesized expressions. Such patterns can result in
exponentially slow backtracking behavior unless the quantified
subpatterns match on their first "pass".
Try to maximize the length of any non-optional literal strings in
regular expressions. This is counterintuitive, but longer patterns
often match faster than shorter patterns. That's because the
optimizer looks for constant strings and hands them off to a
Boyer-Moore search, which benefits from longer strings. Compile your
pattern with the -Dr
debugging switch to see what
Perl thinks the longest literal string is.
Avoid expensive subroutine calls in tight loops.
There is overhead associated with calling subroutines, especially when
you pass lengthy parameter lists, or return lengthy values. In
increasing order of desperation, try passing values by reference,
passing values as dynamically scoped globals, inlining the subroutine,
or rewriting the whole loop in C.
Avoid getc for anything but single-character terminal I/O.
In fact, don't use it for that either. Use sysread.
Use readdir rather than <*>
.
To get all the non-dot files within a directory, say something like:
opendir(DIR,".");
@files = sort grep(!/^\./, readdir(DIR));
closedir(DIR);
Avoid frequent substr on long strings.
Use pack and unpack
instead of multiple substr
invocations.
Use substr as an lvalue rather than
concatenating substrings. For example, to replace the fourth through sixth
characters of $foo
with the contents of the variable
$bar
, don't do:
$foo = substr($foo,0,3) . $bar . substr($foo,6);
Instead, simply identify the part of the string to be replaced,
and assign into it, as in:
substr($foo,3,3) = $bar;
But be aware that if $foo
is a huge string, and $bar
isn't exactly 3
characters long, this can do a lot of copying too.
Use s///
rather than concatenating substrings.
This is especially true if you can replace one constant with another of
the same size. This results in an in-place substitution.
Use modifiers and equivalent and and
or, instead of
full-blown conditionals.
Statement modifiers and logical operators avoid the overhead of entering
and leaving a block. They can often be more readable too.
Use $foo = $a || $b || $c
.
This is much faster (and shorter to say) than:
if ($a) {
$foo = $a;
}
elsif ($b) {
$foo = $b;
}
elsif ($c) {
$foo = $c;
}
Similarly, set default values with:
$pi ||= 3;
Group together any tests that want the same initial string.
When testing a string for various prefixes in anything resembling a
switch structure, put together all the /^a/
patterns, all the
/^b/
patterns, and so on.
Don't test things you know won't match.
Use last or elsif
to avoid falling through to the next
case in your switch statement.
Use special operators like study, logical string operations,
pack 'u'
and unpack '%'
formats.
Beware of the tail wagging the dog.
Misstatements resembling (<STDIN>)[0]
and 0
.. 2000000
can
cause Perl much unnecessary work. In accord with UNIX philosophy, Perl
gives you enough rope to hang yourself.
Factor operations out of loops. The Perl optimizer does not attempt to
remove invariant code from loops. It expects you to exercise some sense.
Slinging strings can be faster than slinging arrays.
Slinging arrays can be faster than slinging strings.
It all depends on whether you're going to reuse the strings or arrays,
and on which operations you're going to perform. Heavy modification of each
element implies that arrays will be better, and occasional modification of
some elements implies that strings will be better. But you just have to
try it and see.
my variables are normally
faster than local variables.
Sorting on a manufactured key array may be faster than using a fancy sort
subroutine.
A given array value may participate in several sort comparisons, so if
the sort subroutine has to do much recalculation, it's better to
factor out that calculation to a separate pass before the actual sort.
tr/abc//d
is faster than s/[abc]//g
.
print
with a comma separator may be faster than concatenating strings.
For example:
print $fullname{$name} . " has a new home directory " .
$home{$name} . "\n";
has to glue together the two hashes and the two
fixed strings before passing them to the low-level print routines, whereas:
print $fullname{$name}, " has a new home directory ",
$home{$name}, "\n";
doesn't. On the other hand, depending on the values and the architecture,
the concatenation may be faster. Try it.
Prefer join("", ...)
to a series of concatenated strings.
Multiple concatenations may cause strings to be copied back and
forth multiple times. The join operator avoids this.
split on a fixed string is generally faster than
split on a
pattern.
That is, use split(/ /,...)
rather than
split(/ +/,...)
if you know there will only be one space.
However, the patterns /\s+/
, /^/
and / /
are
specially optimized, as is the split on whitespace.
Pre-extending an array or string can save some time.
As strings and arrays grow, Perl extends them by allocating a new copy
with some room for growth and copying in the old value. Pre-extending a
string with the x operator or an array by setting $#array
can prevent this occasional overhead, as well as minimize memory
fragmentation.
Don't undef long strings and arrays if they'll be reused for the
same purpose.
This helps prevent reallocation when the string or array must be re-extended.
Prefer "\0" x 8192
over unpack("x8192",())
.
system("mkdir...")
may be faster on multiple directories if
mkdir(2) isn't available.
Avoid using eof if return values will already indicate it.
Cache entries from passwd and group (and so on) that are apt to be reused.
For example, to cache the return value from gethostbyaddr when
you are converting numeric addresses (like 198.112.208.11
) to names
(like "www.ora.com"), you can use something like:
sub numtoname {
local($_) = @_;
unless (defined $numtoname{$_}) {
local(@a) = gethostbyaddr(pack('C4', split(/\./)),2);
$numtoname{$_} = @a > 0 ? $a[0] : $_;
}
$numtoname{$_};
}
Avoid unnecessary system calls.
Operating system calls tend to be rather expensive. So for example,
don't call the time operator when a cached value of $now
would do. Use the special _ filehandle to avoid unnecessary
stat(2) calls. On some systems, even a minimal system call may
execute a thousand instructions.
Avoid unnecessary system calls.
The system operator has to fork a subprocess and execute the
program you specify. Or worse, execute a shell to execute the program
you specify. This can easily execute a million instructions.
Worry about starting subprocesses, but only if they're frequent.
Starting a single pwd, hostname, or find process isn't
going to hurt you much - after all, a shell starts subprocesses all day
long. We do occasionally encourage the toolbox approach, believe it or not.
Keep track of your working directory yourself rather than calling
pwd repeatedly.
(A package is provided in the standard library for this.
See the Cwd module in Chapter 7.)
Avoid shell metacharacters in commands - pass lists to system and
exec where appropriate.
Set the sticky bit on the Perl interpreter on machines without demand paging.
chmod +t /usr/bin/perl
Using defaults doesn't make your program faster.