I'm working on a parser that, based on a specific context, may support different tokens. Here's a simplified example:
<DEFAULT> TOKEN [IGNORE_CASE] : {
< OPERATION: "op" > : OperationType |
< OBJ0: "obj0" > : ExtendedContext |
< OBJ1: "obj1" > : BaseContext
}
<BaseContext, ExtendedContext> TOKEN [IGNORE_CASE] : {
< ARG0: "arg0" >
}
<ExtendedContext> TOKEN [IGNORE_CASE] : {
< ARG1: "arg1" >
}
The problem is that I can reach those contexts from different lexical states. Let's say:
<OperationType> TOKEN :
{
< MODIFY : "modify" > : BaseContext, ExtendedContext
}
Of course, I understand that I cannot specify both lexical states here, but I would need something similar.
I've attempted to implement a SwitchTo strategy based on the context by defining functions that determine whether the operation belongs to an ExtendedContext or a BaseContext. However, this approach seems to break some functionalities, and I'm not sure if it would work as expected or if there is a better way to address the issue.
Example of solution that I tried (but does not works in all scenarios):
TOKEN_MGR_DECLS : {
int contextLexState= BaseContext;
void moveToContext(int contextLexState) {
setLexStateContextNoSwitch(contextLexState);
switchToContext();
}
void switchToContext() {
SwitchTo(contextLexState);
}
void setLexStateContextNoSwitch(int contextLexState) {
this.contextLexState = contextLexState;
}
}
<DEFAULT> TOKEN [IGNORE_CASE] : {
< OPERATION: "op" > : OperationType |
< OBJ0: "obj0" > : { moveToContext(ExtendedContext); } |
< OBJ1: "obj1" > : { moveToContext(BaseContext); }
}
<OperationType> TOKEN :
{
< MODIFY : "modify" > : { switchToContext(); }
}
The parser should correctly parse something like:
op modify obj0 arg0
op modify obj1 arg1
obj1 arg0
obj0 arg0 ...
But not those:
op modify obj0 arg1
obj0 arg1
Since arg1 belong only to the extended context.
Any help would be usefull! Thanks.
Legacy JavaCC really just does not hardly deal with this problem. Actually, the main reason it tends not to work is that there is a longstanding problem in terms of LOOKAHEAD not working in conjunction with lexical states.
You really ought to do yourself a favor and consider using CongoCC which is a much more advanced version of the JavaCC tool. In particular, there are some articles on this whole context-sensitive tokenization problem here and also a key feature that CongoCC has, the ability turn on and off tokens in a given context. See here. If you have any further questions, you might consider asking them here