perlparsingparenthesesparse-recdescent

Parsing string with nested parentheses using Parse::RecDescent


I'm trying to use Parse::RecDescent make a parser which can parse parenthetical expressions and the unary operator ?.

What I have so far is failing when I create the parser because the rule expression is left-recursive:

use strict;
use warnings;
use Parse::RecDescent;

my $test = <<END;
((foo)? bar)
END

my $grammar = q(
    parse: expression(s)
    expression: string | parend | expression(s)
    parend : "(" (string | expression) ")" /\??/
    string : /\w+/ /\??/

);
my $parser = Parse::RecDescent->new($grammar);
my $result = $parser->parse($test);
if($result){
    print $result;
}else{
    print STDERR "Invalid grammar\n";
}

Solution

  • First, you go from lowest priority to highest priority.

    parse  : expr /\Z/
    
    expr   : list
    
    list   : unary(s?)
    
    unary  : unary '?'
           | term
    
    term   : '(' expr ')'
           | STRING
    
    STRING : /\w+/
    

    Of course,

    unary  : unary '?'
           | term
    

    doesn't work because it's left-recursive. Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent can help you get rid of it. We get

    unary  : term unary_(s?)
    unary_ : '?'
    

    But that's not going to construct the right tree for us. So let's start by flattinging out the "(s?)".

    unary  : term unary_
    unary_ : '?' unary_
           |
    

    Then we can use subrule args to create the right tree.

    unary  : term unary_[ $item[1] ]
    unary_ : '?' unary_[ [ 'postfix?' => $arg[0] ] ]
           | { $arg[0] }
    

    All together:

    use strict;
    use warnings;
    use Data::Dumper      qw( Dumper );
    use Parse::RecDescent qw( );
    
    my $grammar = <<'END';
       {
          use strict;
          use warnings;
       }
    
       parse  : expr /\Z/ { $item[1] }
    
       expr   : list
    
       list   : unary(s?) { [ $item[0] => @{ $item[1] } ] }
    
       unary  : term unary_[ $item[1] ]
       unary_ : '?' unary_[ [ 'postfix?' => $arg[0] ] ]
              | { $arg[0] }
    
       term   : '(' expr ')' { $item[2] }
              | STRING { [ string => $item[1] ] }
    
       STRING : /\w+/
    
    END
    
    my $parser = Parse::RecDescent->new($grammar)
       or die "Invalid grammar\n";
    my $tree = $parser->parse("((foo bar)? baz)\n")
       or die "Invalid text\n";
    print(Dumper($tree));