Here is my short ANTLR4 language:
grammar test;
prog: (decl | expr)+
;
decl: doc | quiz
;
doc: '%doc' paramlist
;
quiz: '%quiz' paramlist STR? '%quiz' ENDL
;
paramlist: '(' VAR '=' PARAMVAL {, VAR '=' PARAMVAL}')'
;
expr:expr '\*' expr
|expr '+' expr
|expr '-' expr
|DOC
;
// tokens
DOC: 'doc';
PERCENT: '%';
VAR: \[a-zA-Z\_\]\[a-zA-Z0-9\_\]\* ;
PARAMVAL: \[^,\]+|'"'\[^"\]\*'"' ;
STR: (\~\["\\\\r\\n\] | EscapeSequence)+ ;
fragment EscapeSequence:
'\\' 'u005c'? \[btnfr"'\\\]
| '\\' 'u005c'? (\[0-3\]? \[0-7\])? \[0-7\]
| '\\' 'u'+ HexDigit HexDigit HexDigit HexDigit;
fragment HexDigit: \[0-9a-fA-F\];
ENDL: '\n' ;
WS: [ \t\n]+ -> skip;
In order to use the doc parser rule, I write '%doc', which ANTLR recognizes according to this screenshot.
However, when I try to fill in the missing PARAMVAL, the parse tree instead recognizes everything as STR.
Same case with quiz.
It works when you add a delimiter around the STR rule. I would like to use the STR rule without a delimiter, however.
Why is the STR rule being recognized when there is no usage of STR from any of the parser rules? (Barring quiz, but that's in the middle of the rule, rather.
As mentioned by 500 - Internal Server Error in the comments: the lexer works independently from the parser. The lexer follows 2 rules:
Because of the first rule, it is clear that the input "%doc(v=^)"
becomes a STR
token.
Some other things that are incorrect, or are working differently than you might think: when defining literal tokens inside parser rules, ANTLR creates lexer rules automatically. This means that if you do:
doc
: '%doc' paramlist
;
DOC : 'doc';
PERCENT : '%';
ANTLR will create this behind the scenes:
doc
: T__0 paramlist
;
T__0 : '%doc';
DOC : 'doc';
PERCENT : '%';
and because of rule 1, the input "%doc" will always become a T__0
token, and never PERCENT
and DOC
tokens.
Also, [^,]
does not match any character other than a comma: it matches either a ^
or a ,
. You probably meant ~[,]
. But be careful: doing ~[,]+
will again (like STR
) match far too many characters.