javaantlrantlr3

Antlr3 grammar generates parsering error on encountering the Pound char


Antlr-3 generating an error on encountering the Pound char ("£") of the French language, which is equivalent char of Hash "#" char of English, even the Unicode value for three special characters @, #, and $ are specified in lexer/parser rule.

FYI: The Unicode value of Pound char (of the French language) = The Unicode value of Hash char (of ENGLISH language).

The lexer/parser rules:

grammar SimpleCalc;

options
{
  k        = 8;
  language = Java;
  //filter   = true;
}
 
tokens {
    PLUS    = '+' ;
    MINUS   = '-' ;
    MULT    = '*' ;
    DIV = '/' ;
}
 
/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
 
expr    : n1=NUMBER ( exp = ( PLUS | MINUS )  n2=NUMBER )* 
{
  if ($exp.text.equals("+"))
   System.out.println("Plus Result = " + $n1.text + $n2.text);
  else
   System.out.println("Minus Result = " + $n1.text + $n2.text);
}
;
 
/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
 
NUMBER  : (DIGIT)+ ;
 
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;
 
fragment DIGIT  : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');

The text file also reading in UTF-8 as:

    public static void main(String[] args) throws Exception
    {
        try
        {
            args = new String[1];
            args[0] = new String("antlr_test.txt");
            SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0], "UTF-8"));
            CommonTokenStream tokens = new CommonTokenStream(lex);
            
            SimpleCalcParser parser = new SimpleCalcParser(tokens);
            
            parser.expr();
            //System.out.println(tokens);
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

The input file is having only 1 line:

 £3 + 4£
 

the error is:

antlr_test.txt line 1:1 no viable alternative at character '£'
antlr_test.txt line 1:7 no viable alternative at character '£'

What is wrong with my approach? or did I miss something?


Solution

  • I cannot reproduce what you describe. When I test your grammar without modifications, I get a NumberFormatException, which is expected, because Integer.parseInt("£3") cannot succeed.

    When I change your embedded code into this:

    {
      if ($exp.text.equals("+"))
       System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) + Integer.parseInt($n2.text.replaceAll("\\D", ""))));
      else
       System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D", "")) - Integer.parseInt($n2.text.replaceAll("\\D", ""))));
    }
    

    and regenerate lexer and parser classes (something you might not have done) and rerun the driver code, I get the following output:

    Result = 7
    

    EDIT

    Perhaps the pound sign in the grammar is the issue? What if you try:

    fragment DIGIT  : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');
    

    instead of:

    fragment DIGIT  : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');
    

    ?