parsingjavacc

JAVACC - Switching Lexical States Based on Context


I'm working on a parser that, based on a specific context, may support different tokens. Here's a simplified example:

<DEFAULT> TOKEN [IGNORE_CASE] : {
    < OPERATION: "op" > : OperationType |
    < OBJ0: "obj0" > : ExtendedContext |
    < OBJ1: "obj1" > : BaseContext 
}

<BaseContext, ExtendedContext> TOKEN [IGNORE_CASE] : {
     < ARG0:  "arg0"  > 
}

<ExtendedContext> TOKEN [IGNORE_CASE] : {
     < ARG1:  "arg1"  > 
}

The problem is that I can reach those contexts from different lexical states. Let's say:

<OperationType> TOKEN :
{
    < MODIFY : "modify" > : BaseContext, ExtendedContext
}

Of course, I understand that I cannot specify both lexical states here, but I would need something similar.


I've attempted to implement a SwitchTo strategy based on the context by defining functions that determine whether the operation belongs to an ExtendedContext or a BaseContext. However, this approach seems to break some functionalities, and I'm not sure if it would work as expected or if there is a better way to address the issue.

Example of solution that I tried (but does not works in all scenarios):

TOKEN_MGR_DECLS : {
    int contextLexState= BaseContext;

    void moveToContext(int contextLexState) {
        setLexStateContextNoSwitch(contextLexState);
        switchToContext();
    }

    void switchToContext() {
        SwitchTo(contextLexState);
    }

    void setLexStateContextNoSwitch(int contextLexState) {
        this.contextLexState = contextLexState;
    }
}

<DEFAULT> TOKEN [IGNORE_CASE] : {
    < OPERATION: "op" > : OperationType |
    < OBJ0: "obj0" > : { moveToContext(ExtendedContext); } |
    < OBJ1: "obj1" > : { moveToContext(BaseContext); }
}

<OperationType> TOKEN :
{
    < MODIFY : "modify" > : { switchToContext(); }
}

The parser should correctly parse something like:

op modify obj0 arg0

op modify obj1 arg1

obj1 arg0

obj0 arg0 ...

But not those:

op modify obj0 arg1

obj0 arg1

Since arg1 belong only to the extended context.

Any help would be usefull! Thanks.


Solution

  • Legacy JavaCC really just does not hardly deal with this problem. Actually, the main reason it tends not to work is that there is a longstanding problem in terms of LOOKAHEAD not working in conjunction with lexical states.

    You really ought to do yourself a favor and consider using CongoCC which is a much more advanced version of the JavaCC tool. In particular, there are some articles on this whole context-sensitive tokenization problem here and also a key feature that CongoCC has, the ability turn on and off tokens in a given context. See here. If you have any further questions, you might consider asking them here