parsingtatsu

Reporting as many distinct syntax errors as there are in TatSu


I'm trying to implement an interpreter for a language that is first parsed by TatSu and then interpreted. I'd like to deliver to my end-users the functionality that my interpreter reports as many potential errors contained in the source of the language at once as possible, including FailedParse errors raised by the underlying TatSu parser.

The only way I know of is that TatSu generates only a single FailedParse exception and then stops parsing.

Is there a way to wrap the TatSu parser in such a way that it resumes parsing the source and reports any further potential syntax errors in the source?


Solution

  • You need error recovery. You can read about that topic on the Web.

    TaTsu doesn't do error recovery on it's own, and provides only some support for it.

    This is the idea. Given this part of a grammar:

    block = {statement ';'}+
    statemtent =
        | if_statement
        | expression
        ;
    

    You alter the grammar to add an error recovery rule.

    block = {statement ';'}+
    statemtent =
        | if_statement
        | expression
        | statement_error
        ;
    
    statement_error = ->&';' ;  # skip until a semicolon is seen
    

    The same kind of recovery can be applied throughout the grammar.

    Then, in the semantic definitions, the parser would log an error message for each of the of the xxx_error rules.

    There are other ways to arrange the rules, and where to place the "skip" (->) expression is up to taste.

    TatSu could provide more support for error recovery, and it will probably start by solving #203