javaparsingantlr4

How can I get the the unrecognized Token in an ANTLRErrorListener for a Lexer?


I have a lexer, generated with ANTLR. What I try to achive is to handle an error during lexing in case of an not recognized token.

How can I get only the character causing this error?


lexer.addErrorListener(new ANTLRErrorListener() {
  @Override
  public void syntaxError(Recognizer<?, ?> recognizer,  
                          Object offendingSymbol, int line,
                          int charPositionInLine, String msg,
                          RecognitionException e) {

  }

  // ....
}

The given message msg contains a message and the character, but I would like to have the character only to generated a better message for the users of my code.


Solution

  • You can get the offending character like this:

    lexer.addErrorListener(new ANTLRErrorListener() {
        @Override
        public void syntaxError(Recognizer<?, ?> recognizer,
                                Object offendingSymbol, int line,
                                int charPositionInLine, String msg,
                                RecognitionException e) {
    
            int index = recognizer.getInputStream().index();
            String text = lexer.getInputStream().getText(Interval.of(index, index));
    
            throw new RuntimeException(String.format("Don't know how to handle '%s'", text));
        }
    
        // ...
    });