I'm new to Bison
and I'm trying to write a parser. I already wrote a scanner in flex. I came up with the following grammar for the parser:
%token NUMBER
%token IDENTIFIER
%start Program
%%
Program: Stats;
Stats: Stats Stat ";"
| /* Epsilon */
;
Stat: "var" IDENTIFIER ":=" Expr
| Lexpr ":=" Expr
| Expr;
Lexpr: IDENTIFIER
| "head" Expr
| "tail" Expr;
Expr: Term
| UnaryOperatorChain Term;
UnaryOperatorChain: UnaryOperatorChain UnaryOperator
| UnaryOperator;
UnaryOperator: "head" | "tail" | "not" | "islist";
Term:
"(" Expr ")"
| NUMBER
| IDENTIFIER;
%%
I get this error:
14 shift/reduce conflicts
shift/reduce conflict on token "head"
I think the problem is that the parser cannot distinguish between Lexpr and Expr and therefore does not know which rule to use. I tried several hours to rearrange my grammar, but nothing seems to work. Thanks in advance :)
Running with bison -v
generates a file with the extension .output
. This helps you find the reason for the conflicts. In your case, it begins with:
State 7 conflicts: 7 shift/reduce
State 8 conflicts: 7 shift/reduce
and later on, for state 7 (similarly for state 8):
State 7
8 Lexpr: "head" • Expr
14 UnaryOperator: "head" •
NUMBER shift, and go to state 4
IDENTIFIER shift, and go to state 19
"head" shift, and go to state 20
"tail" shift, and go to state 21
"not" shift, and go to state 9
"islist" shift, and go to state 10
"(" shift, and go to state 11
NUMBER [reduce using rule 14 (UnaryOperator)]
IDENTIFIER [reduce using rule 14 (UnaryOperator)]
"head" [reduce using rule 14 (UnaryOperator)]
"tail" [reduce using rule 14 (UnaryOperator)]
"not" [reduce using rule 14 (UnaryOperator)]
"islist" [reduce using rule 14 (UnaryOperator)]
"(" [reduce using rule 14 (UnaryOperator)]
Expr go to state 22
UnaryOperatorChain go to state 15
UnaryOperator go to state 16
Term go to state 17
State 7 corresponds to the item set consisting of the two items in the first two lines. In both items, you have just read the token "head"
. The following lines show that Bison does not know what to do when, e.g., a token NUMBER
follows, or a token IDENTIFIER
, and so on.
19 Term: NUMBER •
, thus proceeding with the expression that can follow head
in rule 8; orhead
is a unary operator.This is telling you that your expression language is ambiguous. With experience you should already be able to find the cause of the ambiguity. If not, running again with bison -Wcounterexamples
can help you there too:
First example: Stats "head" • "head" Term ":=" Expr ";" $end
Shift derivation
$accept
↳ 0: Stats $end
↳ 2: Stats Stat ";"
↳ 5: Lexpr ":=" Expr
↳ 8: "head" Expr
↳ 11: UnaryOperatorChain Term
↳ 13: UnaryOperator
↳ 14: • "head"
Second example: Stats "head" • "head" Term ";" $end
Reduce derivation
$accept
↳ 0: Program $end
↳ 1: Stats
↳ 2: Stats Stat ";"
↳ 6: Expr
↳ 11: UnaryOperatorChain Term
↳ 12: UnaryOperatorChain UnaryOperator
↳ 13: UnaryOperator ↳ 14: "head"
↳ 14: "head" •
This tells you that Bison cannot decide how to parse a string prefix such as head head id
. (In fact, even head id
would work here, as a counterexample.) Remember that Bison can only parse LALR(1) grammars, i.e., with one look-ahead symbol. When the first head
is read and the second head
is the look-ahead symbol, Bison needs to decide if this is part of a Lexpr
(which will be later followed by a :=
) or it's part of an Expr
(which will be followed by a ;
). The difference between the two will become apparent when the token :=
is (or is not) found, later on. But, in order to see this token, we would need an arbitrary number of look-ahead symbols, as the size of the Expr
that follows head
in a Lexpr
can be arbitrarily large.
So, the problem here is the common subset of the languages generated by Lexpr
and Expr
, in the two alternative rules for Stat
.
I'm afraid that there's no easy solution for this. You will have to refactor your grammar and eliminate the ambiguity. I think you should give it a shot, so I'm not giving any spoiler. Hint: try to avoid deciding what kind of an expression you have, as long as you see a prefix like head tail head head ...
.