javaantlrerror-recovery

How to stop ANTLR from suppressing syntax errors?


So I'm writing a compiler in Java using ANTLR, and I'm a little puzzled by how it deals with errors.

The default behavior seems to be to print an error message and then attempt, by means of token insertion and such, to recover from the error and continue parsing. I like this in principle; it means that (in the best case) if the user has committed more than one syntax error, they'll get one message per error, but it'll mention all the errors instead of forcing them to recompile to discover the next one. The default error message is fine for my purposes. The trouble comes when it's done reading all the tokens.

I am, of course, using ANTLR's tree constructors to build abstract syntax trees. While it's nice for the parse to continue through syntax errors so the user can see all the errors, once it's done parsing I want to get an exception or some kind of indication that the input wasn't syntactically valid; that way I can stop the compilation and tell the user "sorry, fix your syntax errors and then try again". What I don't want is for it to spit out an incomplete AST based on what it thinks the user was trying to say, and continue to the next phase of compilation with no indication that anything went wrong (other than the error messages which went to the console and I can't see). Yet by default, it does exactly that.

The Definitive ANTLR Reference offers a technique to stop parsing as soon as a syntax error is detected: override the mismatch and recoverFromMismatchedSet methods to throw RecognitionExceptions, and add a @rulecatch action to do the same. This would seem to lose the benefit of recovering from parse errors, but more importantly, it only partially works. If a necessary token is missing (for instance, if a binary operator only has an expression on one side of it), it throws an exception just as expected, but if an extraneous token is added, ANTLR inserts the token that it thinks belongs there and continues on its merry way, producing an AST with no indication of a syntax error except a console message. (To make matters worse, the token it inserted was EOF, so the rest of the file didn't even get parsed.)

I'm sure I could fix this by, say, adding something like an isValid field to the parser and overriding methods and adding actions so that, at the end of the parse, it throws an exception if there were any errors. But is there a better way? I can't imagine that what I'm trying to do is unusual among ANTLR users.


Solution

  • ... [O]nce it's done parsing I want to get an exception or some kind of indication that the input wasn't syntactically valid; that way I can stop the compilation...

    You can call getNumberOfSyntaxErrors on both the lexer and the parser after parsing to determine if there was an error that was covertly accommodated by ANTLR. This doesn't tell you what those errors were, obviously, but I think these methods address the "once it's done parsing ... stop the compilation" part of your question.

    The Definitive ANTLR Reference offers a technique to stop parsing as soon as a syntax error is detected: override the mismatch and recoverFromMismatchedSet methods to throw RecognitionExceptions, and add a @rulecatch action to do the same.

    I don't think you mentioned which version of ANTLR you're using, but the documentation in the ANTLR v3.4 code for the method recoverFromMismatchedSet says it's "not currently used" and an Eclipse "global usage" scan found no callers. Neither here nor there to your main problem, but I wanted to mention it for the record. It may be the correct method to override for your version.

    If a necessary token is missing ..., [the overridden code] throws an exception just as expected, but if an extraneous token is added, ANTLR inserts the token that it thinks belongs there and continues on its merry way...

    Method recoverFromMismatchedToken tests for a recoverable missing and extraneous token by delegating to methods mismatchIsMissingToken and mismatchIsUnwantedToken respectively. If the appropriate method determines that an insertion or deletion will solve the problem, recoverFromMismatchedToken makes the appropriate correction. If it is determined that no operation solves the mismatched token problem, recoverFromMismatchedToken throws a MismatchedTokenException.

    If a recovery operation takes place, reportError is called, which calls displayRecognitionError with the details.

    This applies to ANTLR v3.4 and possibly earlier versions.

    This gives you at least two options:

    I'm partial towards overriding displayRecognitionError because it provides the error message text easily enough and because I know it's going to be called only after a token recovery operation and required state juggling -- no need for my parser to figure out how to recover for itself. This coupled with getNumberOfSyntaxErrors appear to give you the options that you're looking for, assuming that you're working with a relevant version of ANTLR and that I fully understood your problem.