ЭЛЕКТРОННАЯ БИБЛИОТЕКА КОАПП
Сборники Художественной, Технической, Справочной, Английской, Нормативной, Исторической, и др. литературы.



6.14. Matching from Where the Last Pattern Left Off

Problem

You want to match again from where the last pattern left off.

This is a useful approach to take when repeatedly extracting data in chunks from a string.

Solution

Use a combination of the /g match modifier, the \G pattern anchor, and the pos function.

Discussion

If you use the /g modifier on a match, the regular expression engine keeps track of its position in the string when it finished matching. The next time you match with /g, the engine starts looking for a match from this remembered position. This lets you use a while loop to extract the information you want from the string.

while (/(\d+)/g) {
    print "Found $1\n";
}

You can also use \G in your pattern to anchor it to the end of the previous match. For example, if you had a number stored in a string with leading blanks, you could change each leading blank into the digit zero this way:

$n = "   49 here";
$n =~ s/\G /0/g;
print $n;
00049 here

You can also make good use of \G in a while loop. Here we use \G to parse a comma-separated list of numbers (e.g., "3,4,5,9,120"):

while (/\G,?(\d+)/g) {
    print "Found number $1\n";
}

By default, when your match fails (when we run out of numbers in the examples, for instance) the remembered position is reset to the start. If you don't want this to happen, perhaps because you want to continue matching from that position but with a different pattern, use the modifier /c with /g:

$_ = "The year 1752 lost 10 days on the 3rd of September";

while (/(\d+)/gc) {
    print "Found number $1\n";
}

if (/\G(\S+)/g) {
    print "Found $1 after the last number.\n";
}

Found number 1752
Found number 10
Found number 3
Found rd after the last number.

As you can see, successive patterns can use /g on a string and in doing so change the location of the last successful match. The position of the last successful match is associated with the scalar being matched against, not with the pattern. Further, the position is not copied when you copy the string, nor saved if you use the ill-named local operator.

The location of the last successful match can be read and set with the pos function, which takes as its argument the string whose position you want to get or set. If no argument is given, pos operates on $_ :

print "The position in \$a is ", pos($a);
pos($a) = 30;
print "The position in \$_ is ", pos;
pos = 30;

See Also

The /g modifier is discussed in perlre (1) and the "the rules of regular expression matching" section of Chapter 2 of Programming Perl


Previous: 6.13. Approximate MatchingPerl CookbookNext: 6.15. Greedy and Non-Greedy Matches
6.13. Approximate MatchingBook Index6.15. Greedy and Non-Greedy Matches