bisonshift-reduce-conflict

Bison shift/reduce conflit


I get 2 shift/reduce errors when I try to compile the following grammar which is part of a larger one:

%token NAMESPACE IDENTIFIER
%start statement
%%
post_expression
    : IDENTIFIER
    | post_expression '[' expression ']'
    | post_expression '.' IDENTIFIER '(' ')'
    | post_expression '.' IDENTIFIER
    ;
expression
    : post_expression
    | expression '<' post_expression
    | expression '>' post_expression
    ;
data_type
    : IDENTIFIER
    | IDENTIFIER '<' data_type_list '>'
    | IDENTIFIER '<'  '>'
    | IDENTIFIER '['  ']'
    ;
statement
    : expression ';'
    | data_type initializer ';'
    ;
initializer
    : IDENTIFIER
    | IDENTIFIER '=' expression
    ;
data_type_list
    : data_type
    | data_type_list ',' data_type
    ;
%%

the conflict state is as follows:

State 1

    1 post_expression: IDENTIFIER .
    8 data_type: IDENTIFIER .
    9          | IDENTIFIER . '<' data_type_list '>'
   10          | IDENTIFIER . '<' '>'
   11          | IDENTIFIER . '[' ']'

    '['  shift, and go to state 6
    '<'  shift, and go to state 7

    IDENTIFIER  reduce using rule 8 (data_type)
    '['         [reduce using rule 1 (post_expression)]
    '<'         [reduce using rule 1 (post_expression)]
    $default    reduce using rule 1 (post_expression)

Could someone please explain how to fix this error? Is it possible to use precedence to solve the problem?


Solution

  • These two conflicts are the result of the parser being limited to a single lookahead token.

    Consider the two rules:

    post_expression  : post_expression '[' expression ']'
    data_type        : IDENTIFIER '[' ']'
    

    and think about what happens when the parser sees an IDENTIFIER followed by a [. Since IDENTIFIER would match post_expression, both of these rules are active. But the parser cannot proceed without deciding whether or not to reduce the IDENTIFIER to a post_expression, because in a shift-reduce parser, reduction actions must be taken at the end of the production, and no later. If the parser could look two tokens into the remaining input, there wouldn't be a problem; either the second-next token is a ] or it isn't (although in the real grammar, the decision might be more complicated).

    A very similar issue occurs with:

    expression
        : expression '<' post_expression
    data_type
        : IDENTIFIER '<' data_type_list '>'
        : IDENTIFIER '<'  '>'
    

    Again, the decision about whether to reduce IDENTIFIER to expression needs to be taken before the parser has enough information to make the decision.

    You can't fix these problems with precedence rules, because precedence rules don't extend the parser's lookahead. With or without precedence rules, the parser cannot make the correct decision without seeing (at least) the second-next token.

    This is not a conflict you can ignore. Bison will resolve the conflict one way or another (by default, it will choose the shift action, meaning that it will never do the reduction), but that resolution will not be correct for many inputs, and these will generate syntax errors because the parser has been forced into a dead-end.

    So, what to do?