I'm new to Antlr and I'm trying to learn. I have a lexer with defined tokens. And another token that uses a subset of my tokens as so.
ADDQ: 'addq';
SUBQ: 'subq';
ANDQ: 'andq';
XORQ: 'xorq';
OP: (ADDQ | ANDQ | XORQ | SUBQ);
In my parser I have a rule called doOperation as so:
doOperation:
OP REGISTER COMMA REGISTER;
When I test the rule using Intellij's ANTLR plugin. With an example: subq %rax, %rcx. I get an error that says, "mismatched input at subq, expect OP". What is the correct way to do this?
You can use token rules inside of other token rules, but when you do, there should be additional text that's matched around it. Something like:
A: 'abc';
B: A 'def';
Given these rules the string "abc" would produce an A
token and "abcdef" would produce a B
token.
However when you define one rule as an alternative of other rules like you did, you end up with multiple lexical rules that could match the same input. When lexical rules overlap, ANTLR (just like the vast majority of lexer generators) will first pick the rule that would lead to the longest match and, in case of ties, pick the one that appears first in the grammar.
So given your rules, the input addq
would produce an ADDQ
token because ADDQ
appears before OP
in the grammar. Same for SUBQ
and the others. So there's no way an OP
token would ever be generated.
Since you said that you don't use ADDQ
, SUBQ
etc. in your parser rules, you can make them fragments instead of token rules. Fragments can be used in token rules, but aren't themselves tokens. So you'll never end up with a SUBQ
token because SUBQ
isn't a token - you could only get OP
tokens. In fact you don't even have to give them names at all, you could just "inline" them into OP like this:
OP: 'addq' | 'subq' | 'andq' | 'xorq' ;
Another option (one that you'd have to use if you were using SUBQ
etc. directly) is to turn OP
into a parser rule instead of a token. That way the input subq
would still generate a SUBQ
token, but that would be okay because now the op
rule would accept a SUBQ
token.