Do recursive regexes understand named captures? There is a note in the docs for (?{{ code }})
that it's an independent subpattern with its own set of captures that are discarded when the subpattern is done, and there's a note in (?PARNO)
that its "similar to (?{{ code }})
. Is (?PARNO)
discarding its own named captures when it's done?
I'm writing about Perl's recursive regular expressions for Mastering Perl. perlre already has an example with balanced parens (I show it in Matching balanced parenthesis in Perl regex), so I thought I'd try balanced quote marks:
#!/usr/bin/perl
# quotes-nested.pl
use v5.10;
$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE
say "Matched!" if m/
(
['"]
(
(?:
[^'"]+
|
( (?1) )
)*
)
['"]
)
/xg;
print "
1 => $1
2 => $2
3 => $3
4 => $4
5 => $5
";
This works and the two quotes show up in $1
and $3
:
Matched!
1 => 'Amelia said "I am a camel"'
2 => Amelia said "I am a camel"
3 => "I am a camel"
4 =>
5 =>
That's fine. I understand that. However, I don't want to know the numbers. So, I make the first capture group a named capture and look in %-
expecting to see the two substrings I previously saw in $1
and $2
:
use v5.10;
$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE
say "Matched [$+{said}]!" if m/
(?<said>
['"]
(
(?:
[^'"]+
|
(?1)
)*
)
['"]
)
/xg;
use Data::Dumper;
print Dumper( \%- );
I only see the first:
Matched ['Amelia said "I am a camel"']!
$VAR1 = {
'said' => [
'\'Amelia said "I am a camel"\''
]
};
I expected that (?1)
would repeat everything in the first capture group, including the named capture to said
. I can fix that a bit by naming a new capture:
use v5.10;
$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE
say "Matched [$+{said}]!" if m/
(?<said>
['"]
(
(?:
[^'"]+
|
(?<said> (?1) )
)*
)
['"]
)
/xg;
use Data::Dumper;
print Dumper( \%- );
Now I get what I expected:
Matched ['Amelia said "I am a camel"']!
$VAR1 = {
'said' => [
'\'Amelia said "I am a camel"\'',
'"I am a camel"'
]
};
I thought that I could fix this by moving the named capture up one level:
use v5.10;
$_ =<<'HERE';
He said 'Amelia said "I am a camel"'
HERE
say "Matched [$+{said}]!" if m/
(
(?<said>
['"]
(
(?:
[^'"]+
|
(?1)
)*
)
['"]
)
)
/xg;
use Data::Dumper;
print Dumper( \%- );
But, this doesn't catch the smaller substring in said
either:
Matched ['Amelia said "I am a camel"']!
$VAR1 = {
'said' => [
'\'Amelia said "I am a camel"\''
]
};
I think I understand this, but I also know that there are people here who actually touch the C code that makes it happen. :)
And, as I write this, I think I should overload the STORE tie for %-
to find out, but then I'd have to find out how to do that.
After playing around with this, I'm satisfied that what I said in the question is right. Each call to (?PARNO)
gets a complete and separate set of the match variables that it discards at the end of its run.
You can get all the things that matched in each sub pattern by using an array external to the pattern match operator and pushing onto it at the end of the repeated sub pattern, like in this example:
#!/usr/bin/perl
# nested_carat_n.pl
use v5.10;
$_ =<<'HERE';
Outside "Top Level 'Middle Level "Bottom Level" Middle' Outside"
HERE
my @matches;
say "Matched!" if m/
(?(DEFINE)
(?<QUOTE_MARK> ['"])
(?<NOT_QUOTE_MARK> [^'"])
)
(
(?<quote>(?"E_MARK))
(?:
(?&NOT_QUOTE_MARK)++
|
(?R)
)*
\g{quote}
)
(?{ push @matches, $^N })
/x;
say join "\n", @matches;
I go through it in depth in Chapter 2 of Mastering Perl, which you can read for free (at least for awhile).