[SOLVED] JAVACC - Switching Lexical States Based on Context

JAVACC - Switching Lexical States Based on Context

I'm working on a parser that, based on a specific context, may support different tokens. Here's a simplified example:

<DEFAULT> TOKEN [IGNORE_CASE] : {
    < OPERATION: "op" > : OperationType |
    < OBJ0: "obj0" > : ExtendedContext |
    < OBJ1: "obj1" > : BaseContext 
}

<BaseContext, ExtendedContext> TOKEN [IGNORE_CASE] : {
     < ARG0:  "arg0"  > 
}

<ExtendedContext> TOKEN [IGNORE_CASE] : {
     < ARG1:  "arg1"  > 
}

The problem is that I can reach those contexts from different lexical states. Let's say:

<OperationType> TOKEN :
{
    < MODIFY : "modify" > : BaseContext, ExtendedContext
}

Of course, I understand that I cannot specify both lexical states here, but I would need something similar.

I've attempted to implement a SwitchTo strategy based on the context by defining functions that determine whether the operation belongs to an ExtendedContext or a BaseContext. However, this approach seems to break some functionalities, and I'm not sure if it would work as expected or if there is a better way to address the issue.

Example of solution that I tried (but does not works in all scenarios):

TOKEN_MGR_DECLS : {
    int contextLexState= BaseContext;

    void moveToContext(int contextLexState) {
        setLexStateContextNoSwitch(contextLexState);
        switchToContext();
    }

    void switchToContext() {
        SwitchTo(contextLexState);
    }

    void setLexStateContextNoSwitch(int contextLexState) {
        this.contextLexState = contextLexState;
    }
}

<DEFAULT> TOKEN [IGNORE_CASE] : {
    < OPERATION: "op" > : OperationType |
    < OBJ0: "obj0" > : { moveToContext(ExtendedContext); } |
    < OBJ1: "obj1" > : { moveToContext(BaseContext); }
}

<OperationType> TOKEN :
{
    < MODIFY : "modify" > : { switchToContext(); }
}

The parser should correctly parse something like:

op modify obj0 arg0

op modify obj1 arg1

obj1 arg0

obj0 arg0 ...

But not those:

op modify obj0 arg1

obj0 arg1

Since arg1 belong only to the extended context.

Any help would be usefull! Thanks.

Solution

Legacy JavaCC really just does not hardly deal with this problem. Actually, the main reason it tends not to work is that there is a longstanding problem in terms of LOOKAHEAD not working in conjunction with lexical states.

You really ought to do yourself a favor and consider using CongoCC which is a much more advanced version of the JavaCC tool. In particular, there are some articles on this whole context-sensitive tokenization problem here and also a key feature that CongoCC has, the ability turn on and off tokens in a given context. See here. If you have any further questions, you might consider asking them here