regexperlregex-recursion

How to get a group matched in a recursive regular expression?


I'm writing a simple regular expression that needs receive a pair of coordinates and/or a map name.

For example:

move 10 15 # should returns [[10, 15]]
move 10 15 map # should returns [[10, 15, 'map']]
move map # should returns [['map']]
move 10 15 mapA mapB # should returns [[10, 15, 'mapA'], ['mapB']] 
move 10 15 mapA mapB 33 44 # should returns [[10, 15, 'mapA'], ['mapB'], [33, 44]]
move 10 15 mapA 33 44 mapB # should returns [[10, 15, 'mapA'], [33, 44, 'mapB']]

Then, I wrote this regular expression:

/
  (?(DEFINE)
     (?<coord>    (?<x>\d+)\s+(?<y>\d+) )
     (?<map>      (?<mapname>[a-zA-Z]+) )
     (?<commands> \s* (?: (?&coord) | (?&map) ) \s* (?&commands)? )
  )
  move\s+(?&commands)
/six

But how I can get the value for groups x, y and map using Perl?

I tried with some ways:

use strict;
use warnings;

my $command = 'move 10 15';

$command =~ /
  (?(DEFINE)
     (?<coord>    (?<x>\d+)\s+(?<y>\d+) )
     (?<map>      (?<mapname>[a-zA-Z]+) )
     (?<commands> \s* (?: (?&coord) | (?&map) ) \s* (?&commands)? )
  )
  move\s+(?&commands)
/six;

while (my ($k,$v) = each %+) { print "$k $v\n" }
print "$+{x}";

Solution

  • As the question stands, you can't have that. The perlre says about this

    Note that capture groups matched inside of recursion are not accessible after the recursion returns, so the extra layer of capturing groups is necessary.

    but the pattern <x> can't be had with "an extra layer" of capturing afterwards since it's used only inside the grammar. You can only have the whole thing

    if ($command =~ /
            move\s+ (?<match>(?&commands))
            (?(DEFINE)
                (?<coord>    (?<x>\d+)\s+(?<y>\d+) )
                (?<map>      (?<mapname>[a-zA-Z]+) )
                (?<commands> \s* (?: (?&coord) | (?&map) ) \s* (?&commands)? )
            )
        /six)
    {
        say "got: $+{match}";
    }
    

    where I've moved the ?(DEFINED) block at the end of the pattern, as recommended.

    Note that it wouldn't make sense either: in a recursive match, which of multiple <x> should one get? So you'd need to restructure the approach so to be able to re-capture the match you want; but I don't see how to do this if you want a subpattern buried so deeply.

    For the problem as presented I'd go with writing a simple parser, never mind a catch-all regex. Or, in your approach re-process the match for its parts, hopefully much easier once you have it.

    And then there are powerful tools, like Marpa::R2, Parse::RecDescent, Regexp::Grammars.