I have a problem with a [relatively simple] grammar, a distilled version of a much bigger grammar. Bison reports one shift/reduce conflict, which is resolved incorrectly: the resulting parser fails to parse valid language. I am not kidding: I intermittently spent more than a year trying to eliminate the conflict but failed. ChatGPT is uncooperative :) Please advise.
%token YY_WORD_NAME
%left YY_IN
%%
word_exp:
word_exp YY_IN set_exp
| YY_WORD_NAME
set_exp:
'{' word_exp '}'
| '{' YY_WORD_NAME YY_IN set_exp ':' word_exp '}'
%%
The basic problem you have here is lookahead -- in order to decide that a YY_WORD_NAME
is not a word_exp
, you need to look ahead through the following YY_IN set_expr
(and unbounded number of tokens) to see the :
.
The easiest fix for this is probably to "relax" the grammar a bit (accept some constructs that are not legal), and then have a post-pass that flags the erroneous conditions. If you use:
word_exp:
word_exp YY_IN set_exp
| YY_WORD_NAME
set_exp:
'{' word_exp '}'
| '{' word_exp ':' word_exp '}'
That fixes the conflict, but accepts things that don't have exactly one YY_IN
before the :
like:
{ A in { B } in { C } : D }
{ A : B }
(here I'm assuming YY_IN
is an in
keyword and YY_WORD_NAME
is just a non-keyword identifier)
You can even flag this as an error in the bison action immediately after the incorrect construct (rather than by building a data structure you later traverse, though you may be doing that anyway for other reasons)
%token YY_WORD_NAME
%left YY_IN
%union {
struct {
int incnt;
// any other info needed from word_exp
} word_exp;
}
%type<word_exp> word_exp
%%
word_exp:
word_exp YY_IN set_exp { $$.incnt = $1.incnt + 1; }
| YY_WORD_NAME { $$.incnt = 0; }
set_exp:
'{' word_exp '}'
| '{' word_exp ':' word_exp '}' { if ($2.incnt != 1) {
yyerror("Syntax error");
YYERROR; /* or YYABORT */
}
}
%%