antlrantlr4antlr3antlr2

antlr2 to antlr4 class specifier, options, TOKENS and more


I need to rewrite a grammar file from antlr2 syntax to antlr4 syntax and have the following questions.

1) Bart Kiers states there is a strict order: grammar, options, tokens, @header, @members in this SO post. This antlr2.org post disagrees stating header is before options. Is there a resource that states the correct order (if one exists) for antlr4?

2) The same antlr2.org post states: "The options section for a grammar, if specified, must immediately follow the ';' of the class specifier:

class MyParser extends Parser;
options { k=2; }

However, when running with antlr4, any class specifier creates this error:

syntax error: missing COLON at 'MyParser' while matching a rule

3) What happened to options in antlr4? says there are no rule-level options at the time.

warning(83): MyGrammar.g4:4:4: unsupported option k
warning(83): MyGrammar.g4:5:4: unsupported option exportVocab
warning(83): MyGrammar.g4:6:4: unsupported option codeGenMakeSwitchThreshold
warning(83): MyGrammar.g4:7:4: unsupported option codeGenBitsetTestThreshold
warning(83): MyGrammar.g4:8:4: unsupported option defaultErrorHandler
warning(83): MyGrammar.g4:9:4: unsupported option buildAST

i.) does antlr4's adaptive LL(*) parsing algorithm no longer require k token lookhead?

ii.) is there an equivalent in antlr4 for exportVocab?

iii.) are there equivalents in antlr4 for optimizations codeGenMakeSwitchThreshold and codeGenBitsetTestThreshold or have they become obsolete?

iv.) is there an equivalent for defaultErrorHandler ?

v.) I know antlr4 no longer builds AST. I'm still trying to get a grasp of how this will affect what uses the currently generated *Parser.java and *Lexer.java.

4) My current grammar file specifies a TOKENS section

tokens {
    ROOT; FOO; BAR; TRUE="true"; FALSE="false"; NULL="null";
}

I changed the double quotes to single quotes and the semi-colons to commas and the equal sign to a colon to try and get rid of each syntax error but have this error:

mismatched input ':' expecting RBRACE

along with others. Rewritten looks like:

tokens {
    ROOT; FOO; BAR; TRUE:'true'; FALSE:'false' ...
}

so I removed :'true' and :'false' and TRUE and FALSE will appear in the generated MyGrammar.tokens but I'm not sure if it will function the same as before.

Thanks!


Solution

    1. Just look at the ultimate source for the syntax: the ANTLR4 grammar. As you can see the order plays no role in the prequel section (which includes named actions, options and the like, you can even have more than one option section). The only condition is that the prequel section must appear before any rule.

    2. The error is about a wrong option. Remove that and the error will go away.

    3. Many (actually most of the old) options are no longer needed and supported in ANTLR4.

    i.) ANTLR4 uses unlimited lookahead (hence the * in ALL(*)). You cannot specify any other lookahead.

    ii.) The exportVocab has long gone (not even ANTLR3 supports it). It only specifies a name for the .tokens file. Use the default instead.

    iii.) Nothing like that is needed nor supported anymore. The prediction algorithm has completely changed in ANTLR4.

    iv.) You use an error listener instead. There are many examples how to do that (also here at SO).

    v.) Is that a question or just thinking loudly? Hint: ANTLR4 based parsers generate a parse tree.

    1. I'm not 100% sure about this one, but I believe you can no longer specify the value a token should match in the tokens section. Instead this is only for virtual tokens and everything else must be specified as normal lexer tokens.

    To sum up: most of the special options and tricks required for older ANTLR grammars are not needed anymore and must be removed. The new parsing algorithm can deal with all the ambiquities automatically, which former versions had trouble with and needed guidance from the user for.