bison yacc lex shift-reduce-conflict lalr

Solving small shift reduce conflict

I have had a shift-reduce conflict and reduced it to a couple of lines:

start               : instruction_A;

instruction_A       : instruction_A instruction
                    |
                    ;

instruction         : RETURN 'X'
                    | RETURN
                    | 'X' '!'
                    ;

3: shift/reduce conflict (shift 6, reduce 5) on 'X'
state 3
    instruction : RETURN . 'X'  (4)
    instruction : RETURN .  (5)

    'X'  shift 6
    $end  reduce 5
    RETURN  reduce 5

My guess is that the expression X! RETURN X! cannot be parsed because when the parser reaches the 'X' token, it does not know if it should continue the instruction or start another one.

Basically, the expected arrangements are:

{X! RETURN} {X!}

{X! RETURN X}

How to solve this? It seems like reading an extra token would solve the problem, but this must be done with LALR(1).

Solution

That's definitely the correct analysis of this shift-reduce conflict. The grammar is unambiguous but LALR(2).

There is a mechanical procedure to construct an LALR(1) grammar from a grammar which is LALR(k), and that could be applied to this example. But I didn't write it out because it's quite long, and I doubt that it is actually useful. I suspect that the original grammar was more like

statement
    : return_statement
    | expr_statement
    | ...
return_statement
    : "return" expression
    | "return"
expression_statement
    : expression '!'

Where X has been replaced with a non-terminal which can derive phrases of unbounded length. As a result, this grammar is not LALR(k) for any k. But it's still unambiguous.

So your best option is probably to ask Bison to generate a GLR parser, which can handle arbitrary unambiguous grammars. If you're using bison and the conflicted states occur rarely in practice, then the overhead of GLR is very small, since bison optimises GLR for deterministic states.