javascriptparsinggrammarecmascript-5javacc

How to implement a negative LOOKAHEAD check for a token in JavaCC?


I currently implementing a JavaScript/ECMAScript 5.1 parser with JavaCC. I recently learned about LOOKAHEADs which are handy here as the grammar is not fully LL(1).

One of the things I see in the ECMAScript grammar is "negative lookahead check", like in the following ExpressionStatement production:

ExpressionStatement :
    [lookahead ∉ {{, function}] Expression ;

So I'll probably need something like LOOKAHEAD(!("{" | "function")) but it does not work in this syntax.

My question is, how could I implement this "negative LOOKAHEAD" it in JavaCC?

After reading the LOOKAHEAD MiniTutorial I think that an expression like getToken(1).kind != FUNCTION may be what I need, but I am not quite sure about it.


Solution

  • For the example you provide, I would prefer to use syntactic look ahead, which is in a sense necessarily "positive".

    The production for ExpressionStatement is not the place to tackle the problem as there is no choice.

    void ExpressionStatement() : {} { Expression() ";" }
    

    The problem will arise where there is a choice between an expression statement and a block or between an expression statement and a function declaration (or both).

    E.g. in Statement you will find

    void Statement() :{} {
        ...
    |
        Block()
    |
        ExpressionStatement() 
    |   ...
    }
    

    gives a warning because both choices can start with a "{". You have two options. One is to ignore the warning. The first choice will be taken and all will be well, as long as Block comes first. The second choice is to suppress the warning with a lookahead specification. like this:

    void Statement() :{} {
        ...
    |
        LOOKAHEAD("{") Block()
    |
        ExpressionStatement() 
    |   ...
    }
    

    Syntactic look ahead is, in a sense positive -- "take this alternative if X".

    If you really want a negative --i.e., "take this alternative if not X"-- look ahead it has to be semantic.

    In the case of Statement you could write

    void Statement() :{} {
        ...
    |
        LOOKAHEAD({!(getToken(1)==LBRACE)}) ExpressionStatement() 
    |   
        Block()
    }
    

    I made sure that these are the last two alternatives since otherwise you'd need to include more tokens in the set of tokens that block ExpressionStatement(), e.g. it should not be chosen if the next token is an "if" or a "while" or a "for", etc.

    On the whole, you are better off using syntactic lookahead when you can. It is usually more straight forward and harder to mess up.