split /PATTERN
/, EXPR
, LIMIT
split /PATTERN
/, EXPR
split /PATTERN
/
split
This function scans a string given by EXPR
for delimiters, and
splits the string into a list of substrings, returning the resulting
list value in list context, or the count of substrings in scalar
context. The delimiters are determined by repeated pattern matching,
using the regular expression given in PATTERN
, so the delimiters
may be of any size, and need not be the same string on every match.
(The delimiters are not ordinarily returned, but see below.) If the
PATTERN
doesn't match at all, split returns the original
string as a single substring. If it matches once, you get two
substrings, and so on.
If LIMIT
is specified and is not negative, the function splits into no
more than that many fields (though it may split into fewer if it runs out of
delimiters). If LIMIT
is negative, it is
treated as if an arbitrarily large LIMIT
has
been specified. If LIMIT
is omitted, trailing null
fields are stripped from the result (which potential users of pop would do well to remember). If
EXPR
is omitted, the function splits the $_ string. If PATTERN
is also
omitted, the function splits on whitespace, /\s+/
, after
skipping any leading whitespace.
Strings of any length can be split:
@chars = split //, $word;
@fields = split /:/, $line;
@words = split ' ', $paragraph;
@lines = split /^/m, $buffer;
A pattern capable of matching either the null string or something longer than
the null string (for instance, a pattern consisting of
any single character modified by a *
or ?
) will split the
value of EXPR
into separate characters wherever it is the null string that
produces the match; non-null matches will skip over occurrences of the
delimiter in the usual fashion. (In other words, a pattern won't match
in one spot more than once, even if it matched with a zero width.)
For example:
print join ':', split / */, 'hi there';
produces the output "h:i:t:h:e:r:e"
.
The space disappears because it matched as part of the delimiter.
As a trivial case, the null
pattern // simply splits into separate
characters (and spaces do not disappear).
The LIMIT
parameter is used to split only part of a string:
($login, $passwd, $remainder) = split /:/, $_, 3;
We encourage you to split to lists of names like this in order to make your code
self-documenting. (For purposes of error checking, note that
$remainder
would be undefined if there were fewer than three
fields.) When assigning to a list, if LIMIT
is
omitted, Perl supplies a LIMIT
one larger than the
number of variables in the list, to avoid unnecessary work. For the split
above, LIMIT
would have been 4 by default, and
$remainder
would have received only the third field, not all
the rest of the fields. In time-critical applications it behooves you not to
split into more fields than you really need.
We said earlier that the delimiters are not returned, but if
the PATTERN
contains parentheses, then the substring matched
by each pair of parentheses is included in the resulting list,
interspersed with the fields that are ordinarily returned.
Here's a simple case:
split /([-,])/, "1-10,20";
produces the list value:
(1, '-', 10, ',', 20)
With more parentheses, a field is returned for each pair, even if some
of the pairs don't match, in which case undefined values are returned
in those positions. So if you say:
split /(-)|(,)/, "1-10,20";
you get the value:
(1, '-', undef, 10, undef, ',', 20)
The /
PATTERN
/
argument may be replaced with an expression to specify patterns that vary at
run-time. (To do run-time compilation only once, use
/$variable/o
.) As a special case, specifying a space
" "
will split on whitespace just as split with no arguments does. Thus, split("
")
can be used to emulate awk's default
behavior, whereas split(/ /)
will give you as many null
initial fields as there are leading spaces. (Other than this special case, if
you supply a string instead of a regular expression, it'll be interpreted as a
regular expression anyway.)
The following example splits an RFC-822 message header into a hash
containing $head{Date}
, $head{Subject}
, and so on. It
uses the trick of assigning a list of pairs to a hash, based on the fact
that delimiters alternate with delimited fields. It makes use of
parentheses to return part of each delimiter as part of the returned
list value. Since the split pattern is guaranteed to return
things in pairs by virtue of containing one set of parentheses, the hash
assignment is guaranteed to receive a list consisting of key/value
pairs, where each key is the name of a header field. (Unfortunately
this technique loses information for multiple lines with the same key
field, such as Received-By lines. Ah, well. . . .)
$header =~ s/\n\s+/ /g; # Merge continuation lines.
%head = ('FRONTSTUFF', split /^([-\w]+):/m, $header);
The following example processes the entries in a UNIX
passwd file. You could leave out the chop, in which case $shell
would have a
newline on the end of it.
open PASSWD, '/etc/passwd';
while (<PASSWD>) {
chop; # remove trailing newline
($login, $passwd, $uid, $gid, $gcos, $home, $shell) =
split /:/;
...
}
The inverse of split is performed by join (except that join can only join with the same delimiter between all
fields). To break apart a string with fixed-position fields, use unpack.