I have two snippets of Happy code here, one that uses normal precedence rules, and one that uses context-dependent precedence rules (both of which are described here).
Normal:
%left '+'
%left '*'
%%
Exp :: { Exp }
: Exp '+' Exp { Plus $1 $3 }
| Exp '*' Exp { Times $1 $3 }
| var { Var $1 }
Context-dependent:
%left PLUS
%left TIMES
%%
Exp :: { Exp }
: Exp '+' Exp %prec PLUS { Plus $1 $3 }
| Exp '*' Exp %prec TIMES { Times $1 $3 }
| var { Var $1 }
Given the input:
a * b + c * d
The normal version gives:
Plus (Times (Var "a") (Var "b")) (Times (Var "c") (Var "d"))
whereas the context-dependent version gives:
Times (Var "a") (Plus (Var "b") (Times (Var "c") (Var "c")))
Shouldn't these both give the same output? What am I doing wrong here that's making them generate different parse trees?
"Context-dependent precedence" is a very misleading way of describing that feature. The description of the precedence algorithm in the preceding section is largely accurate, though.
As it says, a precedence comparison is always between a production (which could be reduced) and a terminal (which could be shifted). That simple fact is often clouded by the decision to design the precedence declaration syntax as though precedence were solely an attribute of the terminal.
The production's precedence is set by copying the precedence of the last terminal in the production unless there is an explicit declaration with %prec
. Or to put it another way, the production's precdence is set with a %prec
clause, defaulting to the precedence of the last token. Either way, you can only define the precedence of the production by saying that it is the same as that of some terminal. Since that is not always convenient, the parser generator gives you the option of using an arbitrary name which is not the name of a grammar symbol. The implementation is to treat the name as a terminal and ignore the fact that it is never actually used in any grammar rule, but logically it is the name of a precedence level which is to be assigned to that particular production.
In your first example, you let the productions default their precedence to the last (in fact, the only) terminal in each production. But in your second example you have defined two named precedence levels, PLUS and TIMES and you use those to set the precedence of two productions. But you do not declare the precedence of any terminal. So when the parser generator attempts to check the relative precedence of the production which could be reduced and the terminal which could be shifted, it finds that only one of those things has a declared precedence. And in that case, it always shifts.