Suppose I have a simple grammar which recognizes lower-case words. Among the words I have some reserved words that I need to address differently.
grammar test;
prog: Reserved | Identifier;
Reserved: 'reserved';
Identifier: [a-z]+;
Given grammar above, is it promised that in case of "reserved" input always the Reserved token will be produced by lexer?
Yes, ANTLR's lexer operates using the following rules:
Because the input "reserved"
can be matched by both the Reserved
and Identifier
rule, the one defined first (Reserved
) gets precedence.
It sometimes happens that keywords kan also be used as identifiers. This is often done by introducing a identifier
parser rule that matches an Identifier
token or some reserved keywords:
identifier : Reserved | Identifier;
Reserved : 'reserved';
Identifier : [a-z]+;
and then use identifier
in other parser rules instead of directly using Identifier
.
After rereading your question: yes, parser rule alternatives are tried from left to right (top to bottom). In the rule p
: p : a | b | c;
, first a
is tried, then b
and lastly c
.
Note that in your example prog: Reserved | Identifier;
, there is no ambiguity since the input "reserved"
will never become an Identifier
token (the first part of my answer).