antlrantlrworks

ANTLR’s predicated-LL(*) parsing mechanism


I'm building the following grammar:

Letter     : 'a'..'z'|'A'..'Z'     ; 

Number      : '0'..'9'     ; 

Float 
   :   Number+ '.' Number+  
   ; 

a5 
@init 
{ 
 int n = 1; 
} 
: ({n<=5}?=>(Letter|Number){n++;})+  
;

It not successfully parsed the string "CD923IJK", because I needs to be consumed "CD923" and not "CDIJK" like happening

If FLoat is commented the problem disappear and consumed "CD923" like I want

Obviously requires an advanced parsing, because this grammar is LL(K), I'm set the lookahead depth

options
{
k=5;
}

But not solved anything. Any idea?

UPDATE

Response to the suggestion 500 - Internal Server Error I added the following rule

public test :a5 Float   ;

I need to match CD9231.23 where CD923 is an alphanumeric and 1.23 a float. But see parse tree: enter image description here


Solution

  • The problem seems to be in the rules Number and Float. You have an ambiguity in this two rules, but due both Number and Float are lexer rules, you must recall that antlr implicit create a nextToken rule to handle all the tokens. The nextToken in the example looks like this:

    nextToken: Letter | Number | Float;
    

    when antlr find a digit he walk through the DFA to find to which rule jump, but in this case he can't decide which is the proper rule (Number or Float) to jump to. You can avoid this behavior making the Float rule a parser rule. You can try something like this:

    grammar a5;
    
    s   : a5 coordinate? 
        ;
    
    a5 
    @init{
     int n = 0;
    }
    : ({n<5}?=> (LETTER|Number){n++;})+
    ;
    
    
    Number  :   '0'..'9'
        ;
    
    coordinate  :    Number+ '.' Number+
        ;
    
    LETTER
        :   'a'..'z'|'A'..'Z'
        ;