I am trying to figure out a JSF*uck grammar. I read the actual JavaScript grammar here and only took the grammar rules that were relevant to JSF*ck and got this:
Expression: AdditiveExpression ;
AdditiveExpression: UnaryExpression
| AdditiveExpression '+' UnaryExpression
;
UnaryExpression: UpdateExpression
| '+' UnaryExpression
| '!' UnaryExpression
;
UpdateExpression: MemberExpression
| UnaryExpression
;
MemberExpression: PrimaryExpression
| MemberExpression '[' Expression ']'
;
PrimaryExpression: ArrayLiteral
| ParenthesizedExpression
;
ParenthesizedExpression: '(' Expression ')' ;
ArrayLiteral: '[' ']' ;
Since I'm trying to implement a recursive descent parser, I tried removing left recursion by changing the grammar to this:
Expression: AdditiveExpression ;
AdditiveExpression: MemberExpression ('+' AdditiveExpression)? ;
MemberExpression: ('+'|'!') Expression
| Atom ('[' Expression ']')?
;
Atom: '[' Expression? ']'
| '(' Expression ')'
;
However, my AST doesn't get built correctly. For example, ![]+[]
should be interpreted as addition(unary_not(array), array)
but I get unary_not(addition(array, array))
instead. I thought that the unary operators would have a higher precendence since I've defined them further down from the topmost rule compared to addition ?
I've also read this answer and this one, but the answers were not really self-explanatory. Any guidance would be greatly appreciated.
What determines what can follow unary +
or -
is not so much the order of the productions ("further down from the top") but rather what follows the (’+'|'-')
in the production. That's expression
, and addition(array, array)
is an expression, so that's how it's parsed
You're hoping that even though you wrote expression
, the parser will restrict it to only certain expressions, but it won't unless you tell it to. So what should you tell it? In this case:
MemberExpression: ('+'|'!') MemberExpression
| Atom ('[' Expression ']')?