parsinggrammarrecursive-descent

Removing left recursion from JSF*ck grammar rules


I am trying to figure out a JSF*uck grammar. I read the actual JavaScript grammar here and only took the grammar rules that were relevant to JSF*ck and got this:

Expression: AdditiveExpression ;
AdditiveExpression: UnaryExpression             
    | AdditiveExpression '+' UnaryExpression    
    ;
UnaryExpression: UpdateExpression
    | '+' UnaryExpression
    | '!' UnaryExpression
    ;
UpdateExpression: MemberExpression
    | UnaryExpression
    ;
MemberExpression: PrimaryExpression
    | MemberExpression '[' Expression ']'
    ;
PrimaryExpression: ArrayLiteral
    | ParenthesizedExpression
    ;
ParenthesizedExpression: '(' Expression ')' ;
ArrayLiteral: '[' ']' ;

Since I'm trying to implement a recursive descent parser, I tried removing left recursion by changing the grammar to this:

Expression: AdditiveExpression ;
AdditiveExpression: MemberExpression ('+' AdditiveExpression)? ;
MemberExpression: ('+'|'!') Expression
    | Atom ('[' Expression ']')?
    ; 
Atom: '[' Expression? ']' 
    | '(' Expression ')'
    ;

However, my AST doesn't get built correctly. For example, ![]+[] should be interpreted as addition(unary_not(array), array) but I get unary_not(addition(array, array)) instead. I thought that the unary operators would have a higher precendence since I've defined them further down from the topmost rule compared to addition ?

I've also read this answer and this one, but the answers were not really self-explanatory. Any guidance would be greatly appreciated.


Solution

  • What determines what can follow unary + or - is not so much the order of the productions ("further down from the top") but rather what follows the (’+'|'-') in the production. That's expression, and addition(array, array) is an expression, so that's how it's parsed

    You're hoping that even though you wrote expression, the parser will restrict it to only certain expressions, but it won't unless you tell it to. So what should you tell it? In this case:

    MemberExpression: ('+'|'!') MemberExpression
        | Atom ('[' Expression ']')?