antlrantlr4

precedence in antl4 OR rules


Suppose I have a simple grammar which recognizes lower-case words. Among the words I have some reserved words that I need to address differently.

grammar test;

prog: Reserved | Identifier;

Reserved: 'reserved';
Identifier: [a-z]+;

Given grammar above, is it promised that in case of "reserved" input always the Reserved token will be produced by lexer?


Solution

  • Yes, ANTLR's lexer operates using the following rules:

    1. try to create a token with the most amount of characters
    2. if there are 2 or more rules that match the same characters, let the one defined first "win"

    Because the input "reserved" can be matched by both the Reserved and Identifier rule, the one defined first (Reserved) gets precedence.

    It sometimes happens that keywords kan also be used as identifiers. This is often done by introducing a identifier parser rule that matches an Identifier token or some reserved keywords:

    identifier : Reserved | Identifier;
    
    Reserved   : 'reserved';
    Identifier : [a-z]+;
    

    and then use identifier in other parser rules instead of directly using Identifier.

    EDIT

    After rereading your question: yes, parser rule alternatives are tried from left to right (top to bottom). In the rule p: p : a | b | c;, first a is tried, then b and lastly c.

    Note that in your example prog: Reserved | Identifier;, there is no ambiguity since the input "reserved" will never become an Identifier token (the first part of my answer).