perlregexp-grammars

Parsing tags from a file with Regexp::Grammars


I'm trying to capture free tags from comments in a program using Perl and the Regexp::Grammars CPAN module.

use strict;
use v5.10;
use YAML;

my $s = q{
      junk code;
      // here be tags #:tag1:
      junk code 2;
      // another one #:tag2:
      junk ...;
};

my $rg = do {
    use Regexp::Grammars;
    qr{
        <nocontext: >  
        ^ .* <Tagger> .* $
        <rule: Tagger>         <[MATCH=single_tag]> +
        <token: single_tag>    \#\:<tag>\:
        <token: tag>           <matchline> \w+
    }xms;
};

if( $s =~ $rg ) {
    say Dump( \%/ );    
} else {
    say 'no match';
}

But the YAML output shows I'm only capturing the last tag:

---
Tagger:
  - tag:
      matchline: 5

How can I match all tags from the input data instead?

And... how can I get the tag's string matched without turning on noisy context strings (removing the nocontext: directive), so that the final result is somewhat more readable, ie:

---
Tagger:
  - tag: tag1
    matchline: 3
  - tag: tag2
    matchline: 5

Solution

  • Found it:

    my $rg = do {
        use Regexp::Grammars;
        qr{
            <nocontext: >  
    
            <Tagger> 
            <rule: Tagger>         <[MATCH=single_tag]>+  % (.*)
            <token: single_tag>    <matchline> \#\:<tag>\:
            <token: tag>           \w+
        }xms;
    };
    

    Which yields the following YAML:

    ---
    Tagger:
      - matchline: 3
        tag: tag1
      - matchline: 5
        tag: tag2