I've got the following (heavily stripped down) Happy grammar
%token
'{' { Langle }
'}' { Rangle }
'..' { DotDot }
'::' { ColonColon }
'@' { At }
mut { Mut }
ident { Ident }
%%
pattern
: binding_mode ident at_pat { error "identifier pattern" }
| expr_path { error "constant expression" }
| expr_path '{' '..' '}' { error "struct pattern" }
binding_mode
: mut { }
| { }
at_pat
: '@' pat { }
| { }
expr_path
: expr_path '::' ident { }
| ident { }
Which has shift/reduce conflicts around identifiers in patterns. By default, Happy chooses to shift, but in this case that isn't what I want: it tries to shoe-horn everything into constant expression
even when it could be an identifier pattern
.
I've read that precedence/associativity is the way to solve this sort of problem, but nothing I've added has been able to budge the grammar in the right direction (to be fair, I've been taking mostly shots in the dark).
Using some obvious tokenization, I would like to have:
x
to yield identifier pattern
mut x
to yield identifier pattern
std::pi
to yield constant expression
point{..}
to yield struct pattern
std::point{..}
to yield struct pattern
Basically, unless there is a {
or ::
token waiting to be consumed, an identifier should go to the identifier pattern
case.
I apologize if my question is unclear - part of the problem is that I'm having a tough time pinpointing what the problem even is. :(
First, it's important to understand what a shift is. A shift is the result of accepting the next input token, and putting it on the parser stack (where it will eventually be part of a production, but it is not necessary to know which one yet.) A reduction is taking zero or more tokens off the top of the stack which match the right-hand side of some production, and replacing them with the left-hand side.
When the parser decides to create an identifier pattern
out of binding_mode ident at_pat
where at_pat
is empty, it is not shifting; it is reducing. In fact, it reduces twice: first it reduces zero stacked symbols into an empty at_pat
and then it reduces the top three stack symbols into an identifier pattern
. Had there been no binding_mode
, it could have reduced ident
to expr_path
and then reduced the expr_path
into a constant_expression
. So that would be a reduce/reduce conflict.
But there is another issue, precisely because binding_mode
is nullable. When the parser sees an ident
, it does not know whether or not a binding_mode
would be possible, so it doesn't know whether to reduce an empty binding_mode
or to shift the ident
. That is a shift/reduce conflict. Since it prefers shift to reduce, it chooses to shift the ident
, which means that an empty binding_mode
cannot be produced , which in turn precludes the reduce/reduce conflict (and prevents ident @ pat
from being recognized at all.)
So to untangle all of that, we need to start by avoiding the necessity to reduce an empty binding_mode
. We do that by the usual nullable-production elimination algorithm, which involves making two copies of the right-hand side, one with the nullable nonterminal, and the other without; we then remove the nullable production. Once we do that, the reduce/reduce conflict appears.
To avoid the reduce/reduce conflict, we need to be explicit about which production is preferred. Reduce/reduce conflicts cannot be resolved through precedence declarations because the precedence algorithm always involves the comparison between a production (which could be reduced) and a terminal (which could be shifted). So the resolution must be explicit, and that means that we need to say that a bare ident
is a pattern, while an expr_path
which is not an ident
is a constant expression. That leaves us with the following:
(Note that I've used non-terminals to label the three different productions for pattern
, rather than relying on actions. For me that makes it easier to think about, and read.)
pattern: identifier_pattern | constant_expression | struct_pattern
Here is the null production elimination:
identifier_pattern: ident at_pat
| binding_mode ident at_pat
Here is the explicit prohibition on idents:
constant_expression: complex_expr_path
struct_pattern: expr_path '{' '..' '}'
binding_mode
is no longer nullable:
binding_mode: mut
at_pat
: '@' pat
| %empty
Here we create the two different expr_paths:
complex_expr_path
: complex_expr_path '::' ident
| ident '::' ident
expr_path: ident | complex_expr_path
I hope that solution has some relationship to your original grammar.