[SOLVED] ANTLR4: implicit or explicit token definition

ANTLR4: implicit or explicit token definition

What are the benefits and drawbacks of using explicit token definitions in ANTLR4? I find the text in single parentheses more descriptive and easier to use than creating a separate token and using that in place of the text.

E.g.:

grammar SimpleTest;

top: library | module ;

library: 'library' library_name ';' ;
library_name: IDENTIFIER;         

module: MODULE module_name ';' ;
module_name: IDENTIFIER;

MODULE: 'module' ;
IDENTIFIER: [a-zA-Z0-9]+;

The generated tokens are:

T__0=1
T__1=2
MODULE=3
IDENTIFIER=4
'library'=1
';'=2
'module'=3

If I am not interested in the 'library' "token", since the rule already establishes what I am matching against, and I will just skip over it anyway, does it make any sense to replace it with LIBRARY and a token declaration? (The number of tokens then will grow.) Why is this a warning in ANTLRWorks?

Solution

Actually, there is a difference between implicit and explicit tokens:

From "The Definitive ANTLR4 Reference", page 76:

ANTLR collects and separates all of the string literals and lexer rules from the parser rules. Literals such as 'enum' become lexical rules and go immediately after the parser rules but before the explicit lexical rules.

ANTLR lexers resolve ambiguities between lexical rules by favoring the rule specified first.

Highlight from me.