parsingfinite-automatamenhir

How does a LR(1) parser handles empty rules?


I have worked with a few parsers (Yacc, Bison and Menhir). If I remember correctly, all of them allow for a rule to be empty. Here is an example of what I mean using Menhir, it is the one I have used the most.

some_list:
 | {[]}
 | some_non_empty_list { $1 }

some_non_empty_list:
 | SEMICOLON some_list { $2 }
 | element { [$1] }
 | element some_non_empty_list { $1 :: $2 }

The important part is that some_list that can reduce on nothingness.

My current understanding of the algorithm to build a parsing table (build NFA, build DFA from the NFA, minimize) leads me to think that this would lead to shift/reduce conflicts all over the place. But it clearly doesn't, because my code worked back then.

So how to build a parsing table that can accept those empty rules?


Solution

  • Why do you think that an empty rule is any harder to handle than a rule with one right-hand-side token?

    Oversimplifying, a grammar rule L = R1 R2 R3 ; means "reduce to L if you see R1 R2 R3". Unsimplifying, if we have X= A L B ; then our L rule means "reduce to L if your left context is A, you have seen R1 R2 R3, and the next token is first(B).

    This idea is the same if L = R1 R2 ; and L = R1 ;.

    And even for the limiting case of the (empty rule): L = ;

    You can't reduce to L unless you have seen its left context, its content, and the beginning of what follows. So you can't reduce to an empty rule at "any time".

    What you need to do is learn how LR parsers work, by learning how to track item sets in while bulding parse states. Do this on paper, once, (painful yes, worth it yes) for a small grammar, and LR parsers will all become clear. You can find this process described in any book on LR parsing including the classic Compilers by Aho et al.