I am new to this and I am trying to understand what is going on here, where I get two shift reduce conflicts. I have grammar (If I am missing something I can add the needed rules):
%start translation_unit
%token EOF 0 "end of file"
%token LBRA "["
%token RBRA "]"
%token LPAR "("
%token RPAR ")"
%token DOT "."
%token ARROW "->"
%token INCDEC "++ or --"
%token COMMA ","
%token AMP "&"
%token STAR "*"
%token ADDOP "+ or -"
%token EMPH "!"
%token DIVOP "/ or %"
%token CMPO "<, >, <=, or >="
%token CMPE "== or !="
%token DAMP "&&"
%token DVERT "||"
%token ASGN "="
%token CASS "*=, /=, %=, +=, or -="
%token SEMIC ";"
%token LCUR "{"
%token RCUR "}"
%token COLON ":"
%token TYPEDEF "typedef"
%token VOID "void"
%token ETYPE "_Bool, char, or int"
%token STRUCT "struct"
%token ENUM "enum"
%token CONST "const"
%token IF "if"
%token ELSE "else"
%token DO "do"
%token WHILE "while"
%token FOR "for"
%token GOTO "goto"
%token CONTINUE "continue"
%token BREAK "break"
%token RETURN "return"
%token SIZEOF "sizeof"
%token<string> IDF "identifier"
%token<string> TYPEIDF "type identifier"
%token<int> INTLIT "integer literal"
%token<string> STRLIT "string literal"
/////////////////////////////////
%%
/////////////////////////////////
primary_expression:
IDF
| INTLIT
| STRLIT
| LPAR expression RPAR
;
postfix_expression:
primary_expression
| postfix_expression LBRA expression RBRA
| postfix_expression LPAR argument_expression_list_opt RPAR
| postfix_expression DOT IDF
| postfix_expression ARROW IDF
| postfix_expression INCDEC
;
argument_expression_list_opt:
%empty
| argument_expression_list
;
argument_expression_list:
assignment_expression
| argument_expression_list COMMA assignment_expression
;
unary_expression:
postfix_expression
| INCDEC unary_expression
| unary_operator cast_expression
| SIZEOF LPAR type_name RPAR
;
unary_operator:
AMP
| STAR
| ADDOP
| EMPH
;
cast_expression:
unary_expression
;
multiplicative_expression:
cast_expression
| multiplicative_expression STAR cast_expression
| multiplicative_expression DIVOP cast_expression
;
additive_expression:
multiplicative_expression
| additive_expression ADDOP multiplicative_expression
;
relational_expression:
additive_expression
| relational_expression CMPO additive_expression
;
equality_expression:
relational_expression
| equality_expression CMPE relational_expression
;
logical_AND_expression:
equality_expression
| logical_AND_expression DAMP equality_expression
;
logical_OR_expression:
logical_AND_expression
| logical_OR_expression DVERT logical_AND_expression
;
assignment_expression:
logical_OR_expression
| unary_expression assignment_operator assignment_expression
;
assignment_operator:
ASGN
| CASS
;
expression:
assignment_expression
;
constant_expression:
logical_OR_expression
;
declaration:
no_leading_attribute_declaration
;
no_leading_attribute_declaration:
declaration_specifiers init_declarator_list_opt SEMIC
;
init_declarator_list_opt:
%empty
| init_declarator_list
;
declaration_specifiers:
declaration_specifier
| declaration_specifier declaration_specifiers
;
declaration_specifier:
storage_class_specifier
| type_specifier_qualifier
;
init_declarator_list:
init_declarator
| init_declarator_list COMMA init_declarator
;
init_declarator:
declarator
;
storage_class_specifier:
TYPEDEF
;
type_specifier:
VOID
| ETYPE
| struct_or_union_specifier
| enum_specifier
| typedef_name
;
struct_or_union_specifier:
struct_or_union IDF LCUR member_declaration_list RCUR
| struct_or_union IDF
;
struct_or_union:
STRUCT
;
member_declaration_list:
member_declaration
| member_declaration_list member_declaration
;
member_declaration:
specifier_qualifier_list member_declarator_list_opt SEMIC
;
member_declarator_list_opt:
%empty
| member_declarator_list
;
specifier_qualifier_list:
type_specifier_qualifier
| type_specifier_qualifier specifier_qualifier_list
;
type_specifier_qualifier:
type_specifier
| type_qualifier
;
member_declarator_list:
member_declarator
| member_declarator_list COMMA member_declarator
;
member_declarator:
declarator
;
enum_specifier:
ENUM IDF LCUR enumerator_list RCUR
| ENUM IDF LCUR enumerator_list COMMA RCUR
| ENUM IDF
;
enumerator_list:
enumerator
| enumerator_list COMMA enumerator
;
enumerator:
ENUM
| ENUM ASGN constant_expression
;
type_qualifier:
CONST
;
pointer:
STAR type_qualifier_list_opt
| STAR type_qualifier_list_opt pointer
;
pointer_opt:
%empty
| pointer
;
declarator:
pointer_opt direct_declarator
;
direct_declarator:
IDF
| LPAR declarator RPAR
| array_declarator
| function_declarator
;
abstract_declarator:
pointer
| pointer_opt direct_abstract_declarator
;
direct_abstract_declarator:
LPAR abstract_declarator RPAR
| array_abstract_declarator
| function_abstract_declarator
;
direct_abstract_declarator_opt:
%empty
| direct_abstract_declarator
;
array_abstract_declarator:
direct_abstract_declarator_opt LBRA assignment_expression RBRA
;
function_abstract_declarator:
direct_abstract_declarator_opt LPAR parameter_type_list RPAR
;
array_declarator:
direct_declarator LBRA assignment_expression RBRA
;
function_declarator:
direct_declarator LPAR parameter_type_list RPAR
;
type_qualifier_list_opt:
%empty
| type_qualifier_list
;
type_qualifier_list:
type_qualifier
| type_qualifier_list type_qualifier
;
parameter_type_list:
parameter_list
;
parameter_list:
parameter_declaration
| parameter_list COMMA parameter_declaration
;
parameter_declaration:
declaration_specifiers declarator
| declaration_specifiers abstract_declarator_opt
;
abstract_declarator_opt:
%empty
| abstract_declarator
;
type_name:
specifier_qualifier_list abstract_declarator_opt
;
typedef_name:
TYPEIDF
;
statement:
matched
| unmatched
| other
;
other:
expression_statement
| compound_statement
| jump_statement
;
matched:
selection_statement_c
| iteration_statement_c
;
unmatched:
selection_statement_o
| iteration_statement_o
;
compound_statement:
LCUR block_item_list_opt RCUR
;
block_item_list_opt:
%empty
| block_item_list
;
block_item_list:
block_item
| block_item_list block_item
;
block_item:
declaration
| statement
;
expression_statement:
expression_opt SEMIC
;
expression_opt:
%empty
| expression
;
selection_statement_o:
IF LPAR expression RPAR statement
| IF LPAR expression RPAR matched ELSE unmatched
;
selection_statement_c:
IF LPAR expression RPAR matched ELSE matched
;
iteration_statement_o:
WHILE LPAR expression RPAR unmatched
| FOR LPAR expression_opt SEMIC expression_opt SEMIC expression_opt RPAR unmatched
;
iteration_statement_c:
WHILE LPAR expression RPAR matched
| DO statement WHILE LPAR expression RPAR SEMIC
| FOR LPAR expression_opt SEMIC expression_opt SEMIC expression_opt RPAR matched
;
jump_statement:
RETURN expression_opt SEMIC
;
translation_unit:
external_declaration
| translation_unit external_declaration
;
external_declaration:
function_definition
| declaration
;
function_definition:
declaration_specifiers declarator compound_statement
;
/////////////////////////////////
%%
And I am getting this shift/reduce conflicts:
State 156
85 declarator: pointer_opt . direct_declarator
91 abstract_declarator: pointer_opt . direct_abstract_declarator
"(" shift, and go to state 187
"identifier" shift, and go to state 44
"(" [reduce using rule 111 (direct_abstract_declarator_opt)]
^^conflict with previous "(" ^^
$default reduce using rule 111 (direct_abstract_declarator_opt)
...
State 197
91 abstract_declarator: pointer_opt . direct_abstract_declarator
"(" shift, and go to state 213
"(" [reduce using rule 111 (direct_abstract_declarator_opt)]
^^conflict^^
$default reduce using rule 111 (direct_abstract_declarator_opt)
So I am understanding this as : When I get "(" I can do two thigs. First, from direct_declarator I get LPAR declarator RPAR where LPAR is "("
... or ...
I can reduce direct_abstract_declarator->array_abstract_declarator->direct_abstract_declarator_opt->direct_abstract_declarator->LPAR abstract_declarator RPAR where LPAR is "(" again. So I can't decide which way to go right ?
How should I tackle this conflict ? Or am I seeing it completely wrong ?
One of the easiest ways to create a shift-reduce conflict is to introduce a new non-terminal representing an optional instance:
something_opt
: something
| %empty
Sometimes that will work, but more commonly it will turn out that there is some context in which something
might or might not be optional. In other words, you have two different productions which use something
:
a: something something_else "END_A"
b: something_opt something_else "END_B"
That's not ambiguous, but it cannot be parsed with one token lookahead, because after something
is recognised, the parser has to decide whether to reduce it to something_opt
, which would be necessary if the following text turns out to be a b
. something_opt
is basically pointless semantically; there is a something
or there is the absence of a something
. And if it were not for that detail, the parser would have no problem; it could continue to parse something_else
and then, depending on whether it encounters END_A
or END_B
, it can decide whether to reduce an a
or a b
.
The fix is simple: instead of defining something_opt
, simply duplicate the production so that there is one production with something
and the other without.
In your case, the offending optional non-terminal is direct_abstract_declarator_opt
:
direct_abstract_declarator_opt:
%empty
| direct_abstract_declarator
array_abstract_declarator:
direct_abstract_declarator_opt LBRA assignment_expression RBRA
function_abstract_declarator:
direct_abstract_declarator_opt LPAR parameter_type_list RPAR
Those are the only places where it is used, so we can just get rid of it by duplicating the productions for the other two non-terminals:
array_abstract_declarator:
direct_abstract_declarator LBRA assignment_expression RBRA
| LBRA assignment_expression RBRA
function_abstract_declarator:
direct_abstract_declarator LPAR parameter_type_list RPAR
| LPAR parameter_type_list RPAR
If at all possible, you should use this model to write grammars with optional non-terminals. It does require duplicating semantic actions, but hopefully the semantic action is just a simple call to an AST node builder. And it will save you a lot of unnecessary grief.