antlr4

How do you match partially complete floating numbers unambiguously?


So, say we have a text of 2.at(0) is there a way to tell antlr4 that 2. can't be a float with omitted trailing 0, unambiguously and without consuming the following token?


Solution

  • Assuming you want to accept 2. in 2. + 3 as a float, but not inside 2.at(0) (or 2. at(0)), then no, that is not possible in ANTLR without some sort of predicate.

    With a predicate, you'll need to add target specific code to your grammar that determines if a . is part of a float, or if it is a DOT token. For the Java target, that might look like this:

    lexer grammar DemoLexer;
    
    @header {
    import java.util.*;
    }
    
    @members {
      private boolean nakedDotPartOfFloat() {
    
        // Start looking ahead 2 steps (1 step ahead id the '.')
        for (int i = 2; ; i++) {
          char nextChar = (char)_input.LA(i);
    
          if (Character.isSpaceChar(nextChar)) {
            // Ignore any space chars
            continue;
          }
    
          // If the character after the '.' is a letter, a float is not possible
          return !Character.isLetter(nextChar);
        }
      }
    }
    
    ADD
     : '+'
     ;
    
    DOT
     : '.'
     ;
    
    INT
     : [0-9]+
     ;
    
    FLOAT
     : [0-9]+ '.' [0-9]+
     | [0-9]+ {nakedDotPartOfFloat()}? '.'
     ;
    
    ID
     : [a-zA-Z]+
     ;
    
    SPACE
     : [ \t\r\n] -> skip
     ;
    

    If you then tokenize the input "2. 2.1 2.foo 2.+3", you'd get the following tokens:

    9 tokens:
      1    FLOAT                          '2.'
      2    FLOAT                          '2.1'
      3    INT                            '2'
      4    DOT                            '.'
      5    ID                             'foo'
      6    FLOAT                          '2.'
      7    ADD                            '+'
      8    INT                            '3'
      9    EOF                            '<EOF>'