regexperl

Why does running the same regex twice yield different results?


While trying to make a response to this question, I've encountered some odd behavior from Perl's regex engine. I have a string that contains 2 quantities that I'm trying to match with a regex. The regex just matches any 8 characters before the string "units/ml". I want to grab both units.

This script only prints the 2nd one that is matched:

use warnings;
use strict;
my $line = 'some data 100,000 units/ml data 20,000 units/ml data';
my @array;
if ($line =~ m/.{8}units\/ml/g) {
    @array = $line =~ m/.{8}units\/ml/g;
    print join(' ', @array) . "\n";
}

Its output:

 20,000 units/ml

If I run line 6 twice, the line that assigns to @array:

use warnings;
use strict;
my $line = 'some data 100,000 units/ml data 20,000 units/ml data';
my @array;
if ($line =~ m/.{8}units\/ml/g) {
    @array = $line =~ m/.{8}units\/ml/g;
    # Let's run that again, for good measure...
    @array = $line =~ m/.{8}units\/ml/g;
    print join(' ', @array) . "\n";
}

Its output:

100,000 units/ml  20,000 units/ml

Why do these two scripts yield different results?


Solution

  • It's because of the /g modifier in your if. Since the if is evaluating the =~ in scalar context, it only gets the first item matched. Then, inside your if block, the @array assignment continues the search from where it left off. (This is useful for parsing.)

    When you run the extra match, you've already finished matching everything in the string, so you start over from the beginning again, in list context, and you get everything then.

    If you remove the g flag in your if, then things work as you expect.