compiler-constructionbisonshift-reduce-conflictshift-reduce

Nested Shift/reduce conflict in bison?


I am new to this and I am trying to understand what is going on here, where I get two shift reduce conflicts. I have grammar (If I am missing something I can add the needed rules):

%start translation_unit

%token EOF      0               "end of file"

%token                      LBRA        "["
%token                      RBRA        "]"
%token                      LPAR        "("
%token                      RPAR        ")"
%token                      DOT         "."
%token                      ARROW       "->"
%token                      INCDEC      "++ or --"
%token                      COMMA       ","
%token                      AMP         "&"
%token                      STAR        "*"
%token                      ADDOP       "+ or -"
%token                      EMPH        "!"
%token                      DIVOP       "/ or %"
%token                      CMPO        "<, >, <=, or >="
%token                      CMPE        "== or !="
%token                      DAMP        "&&"
%token                      DVERT       "||"
%token                      ASGN        "="
%token                      CASS        "*=, /=, %=, +=, or -="
%token                      SEMIC       ";"
%token                      LCUR        "{"
%token                      RCUR        "}"
%token                      COLON       ":"

%token                      TYPEDEF     "typedef"
%token                      VOID        "void"
%token                      ETYPE       "_Bool, char, or int"
%token                      STRUCT      "struct"
%token                      ENUM        "enum"
%token                      CONST       "const"
%token                      IF          "if"
%token                      ELSE        "else"
%token                      DO          "do"
%token                      WHILE       "while"
%token                      FOR         "for"
%token                      GOTO        "goto"
%token                      CONTINUE    "continue"
%token                      BREAK       "break"
%token                      RETURN      "return"
%token                      SIZEOF      "sizeof"

%token<string>              IDF         "identifier"
%token<string>              TYPEIDF     "type identifier"
%token<int>                 INTLIT      "integer literal"
%token<string>              STRLIT      "string literal"

/////////////////////////////////

%%

/////////////////////////////////

primary_expression:
    IDF
    | INTLIT
    | STRLIT
    | LPAR expression RPAR
    ;

postfix_expression:
    primary_expression
    | postfix_expression LBRA expression RBRA
    | postfix_expression LPAR argument_expression_list_opt RPAR
    | postfix_expression DOT IDF
    | postfix_expression ARROW IDF
    | postfix_expression INCDEC
    ;

argument_expression_list_opt:
    %empty
    | argument_expression_list
    ;

argument_expression_list:
    assignment_expression
    | argument_expression_list COMMA assignment_expression
    ;

unary_expression:
    postfix_expression
    | INCDEC unary_expression
    | unary_operator cast_expression
    | SIZEOF LPAR type_name RPAR
    ;

unary_operator:
    AMP
    | STAR
    | ADDOP
    | EMPH
    ;

cast_expression:
    unary_expression
    ;

multiplicative_expression:
    cast_expression
    | multiplicative_expression STAR cast_expression
    | multiplicative_expression DIVOP cast_expression
    ;

additive_expression:
    multiplicative_expression
    | additive_expression ADDOP multiplicative_expression
    ;

relational_expression:
    additive_expression
    | relational_expression CMPO additive_expression
    ;

equality_expression:
    relational_expression
    | equality_expression CMPE relational_expression
    ;

logical_AND_expression:
    equality_expression
    | logical_AND_expression DAMP equality_expression
    ;

logical_OR_expression:
    logical_AND_expression
    | logical_OR_expression DVERT logical_AND_expression
    ;

assignment_expression:
    logical_OR_expression
    | unary_expression assignment_operator assignment_expression
    ;

assignment_operator:
    ASGN 
    | CASS
    ;

expression:
    assignment_expression
    ;

constant_expression:
    logical_OR_expression
    ;

declaration:
    no_leading_attribute_declaration
    ;

no_leading_attribute_declaration:
    declaration_specifiers init_declarator_list_opt SEMIC
    ;

init_declarator_list_opt:
    %empty
    | init_declarator_list
    ;

declaration_specifiers:
    declaration_specifier
    | declaration_specifier declaration_specifiers
    ;

declaration_specifier:
    storage_class_specifier
    | type_specifier_qualifier
    ;

init_declarator_list:
    init_declarator
    | init_declarator_list COMMA init_declarator
    ;

init_declarator:
    declarator
    ;

storage_class_specifier:
    TYPEDEF
    ;

type_specifier:
    VOID
    | ETYPE
    | struct_or_union_specifier
    | enum_specifier
    | typedef_name
    ;

struct_or_union_specifier:
    struct_or_union IDF LCUR member_declaration_list RCUR
    | struct_or_union IDF
    ;

struct_or_union:
    STRUCT
    ;

member_declaration_list:
    member_declaration
    | member_declaration_list member_declaration
    ;

member_declaration:
    specifier_qualifier_list member_declarator_list_opt SEMIC
    ;

member_declarator_list_opt:
    %empty
    | member_declarator_list
    ;

specifier_qualifier_list:
    type_specifier_qualifier
    | type_specifier_qualifier specifier_qualifier_list
    ;

type_specifier_qualifier:
    type_specifier
    | type_qualifier
    ;

member_declarator_list:
    member_declarator
    | member_declarator_list COMMA member_declarator
    ;

member_declarator:
    declarator
    ;

enum_specifier:
    ENUM IDF LCUR enumerator_list RCUR
    | ENUM IDF LCUR enumerator_list COMMA RCUR
    | ENUM IDF
    ;

enumerator_list:
    enumerator
    | enumerator_list COMMA enumerator
    ;

enumerator:
    ENUM
    | ENUM ASGN constant_expression
    ;

type_qualifier:
    CONST
    ;

pointer:
    STAR type_qualifier_list_opt
    | STAR type_qualifier_list_opt pointer
    ;

pointer_opt:
    %empty
    | pointer
    ;

declarator:
    pointer_opt direct_declarator
    ;

direct_declarator:
    IDF
    | LPAR declarator RPAR
    | array_declarator
    | function_declarator
    ;

abstract_declarator:
    pointer
    | pointer_opt direct_abstract_declarator
    ;

direct_abstract_declarator:
    LPAR abstract_declarator RPAR
    | array_abstract_declarator
    | function_abstract_declarator
    ;

direct_abstract_declarator_opt:
    %empty
    | direct_abstract_declarator
    ;

array_abstract_declarator:
    direct_abstract_declarator_opt LBRA assignment_expression RBRA
    ;

function_abstract_declarator:
    direct_abstract_declarator_opt LPAR parameter_type_list RPAR
    ;

array_declarator:
    direct_declarator LBRA assignment_expression RBRA
    ;

function_declarator:
    direct_declarator LPAR parameter_type_list RPAR
    ;

type_qualifier_list_opt:
    %empty
    | type_qualifier_list
    ;

type_qualifier_list:
    type_qualifier
    | type_qualifier_list type_qualifier
    ;

parameter_type_list:
    parameter_list
    ;

parameter_list:
    parameter_declaration
    | parameter_list COMMA parameter_declaration
    ;

parameter_declaration:
    declaration_specifiers declarator
    | declaration_specifiers abstract_declarator_opt
    ;
abstract_declarator_opt:
    %empty
    | abstract_declarator
    ;

type_name:
    specifier_qualifier_list abstract_declarator_opt
    ;

typedef_name:
    TYPEIDF
    ;

statement:
    matched
    | unmatched
    | other
    ;

other:
    expression_statement
    | compound_statement
    | jump_statement
    ;

matched:
    selection_statement_c
    | iteration_statement_c
    ;

unmatched:
    selection_statement_o
    | iteration_statement_o
    ;

compound_statement:
    LCUR block_item_list_opt RCUR
    ;

block_item_list_opt:
    %empty
    | block_item_list
    ;

block_item_list:
    block_item
    | block_item_list block_item
    ;

block_item:
    declaration
    | statement
    ;

expression_statement:
    expression_opt SEMIC
    ;

expression_opt:
    %empty
    | expression
    ;

selection_statement_o:
    IF LPAR expression RPAR statement
    | IF LPAR expression RPAR matched ELSE unmatched
    ;

selection_statement_c:
    IF LPAR expression RPAR matched ELSE matched
    ;

iteration_statement_o:
    WHILE LPAR expression RPAR unmatched
    | FOR LPAR expression_opt SEMIC expression_opt SEMIC expression_opt RPAR unmatched
    ;

iteration_statement_c:
    WHILE LPAR expression RPAR matched
    | DO statement WHILE LPAR expression RPAR SEMIC
    | FOR LPAR expression_opt SEMIC expression_opt SEMIC expression_opt RPAR matched
    ;

jump_statement:
    RETURN expression_opt SEMIC
    ;

translation_unit:
    external_declaration
    | translation_unit external_declaration
    ;

external_declaration:
    function_definition
    | declaration
    ;

function_definition:
    declaration_specifiers declarator compound_statement
    ;

/////////////////////////////////

%%

And I am getting this shift/reduce conflicts:

State 156

   85 declarator: pointer_opt . direct_declarator
   91 abstract_declarator: pointer_opt . direct_abstract_declarator

    "("           shift, and go to state 187
    "identifier"  shift, and go to state 44

    "("       [reduce using rule 111 (direct_abstract_declarator_opt)]
    ^^conflict with previous "(" ^^
    $default  reduce using rule 111 (direct_abstract_declarator_opt)

...

State 197

   91 abstract_declarator: pointer_opt . direct_abstract_declarator

    "("  shift, and go to state 213
    "("       [reduce using rule 111 (direct_abstract_declarator_opt)]
    ^^conflict^^

    $default  reduce using rule 111 (direct_abstract_declarator_opt)


So I am understanding this as : When I get "(" I can do two thigs. First, from direct_declarator I get LPAR declarator RPAR where LPAR is "("
... or ...
I can reduce direct_abstract_declarator->array_abstract_declarator->direct_abstract_declarator_opt->direct_abstract_declarator->LPAR abstract_declarator RPAR where LPAR is "(" again. So I can't decide which way to go right ?
How should I tackle this conflict ? Or am I seeing it completely wrong ?


Solution

  • One of the easiest ways to create a shift-reduce conflict is to introduce a new non-terminal representing an optional instance:

    something_opt
        : something
        | %empty
    

    Sometimes that will work, but more commonly it will turn out that there is some context in which something might or might not be optional. In other words, you have two different productions which use something:

    a: something something_else "END_A"
    b: something_opt something_else "END_B"
    

    That's not ambiguous, but it cannot be parsed with one token lookahead, because after something is recognised, the parser has to decide whether to reduce it to something_opt, which would be necessary if the following text turns out to be a b. something_opt is basically pointless semantically; there is a something or there is the absence of a something. And if it were not for that detail, the parser would have no problem; it could continue to parse something_else and then, depending on whether it encounters END_A or END_B, it can decide whether to reduce an a or a b.

    The fix is simple: instead of defining something_opt, simply duplicate the production so that there is one production with something and the other without.

    In your case, the offending optional non-terminal is direct_abstract_declarator_opt:

    direct_abstract_declarator_opt:
        %empty
        | direct_abstract_declarator
     
    array_abstract_declarator:
        direct_abstract_declarator_opt LBRA assignment_expression RBRA
    
    function_abstract_declarator:
        direct_abstract_declarator_opt LPAR parameter_type_list RPAR
    

    Those are the only places where it is used, so we can just get rid of it by duplicating the productions for the other two non-terminals:

    array_abstract_declarator:
        direct_abstract_declarator LBRA assignment_expression RBRA
        | LBRA assignment_expression RBRA
    
    function_abstract_declarator:
        direct_abstract_declarator LPAR parameter_type_list RPAR
        | LPAR parameter_type_list RPAR
    

    If at all possible, you should use this model to write grammars with optional non-terminals. It does require duplicating semantic actions, but hopefully the semantic action is just a simple call to an AST node builder. And it will save you a lot of unnecessary grief.