antlr4

NoViableAltException while parsing SQL statement, why


  1. Antlr version: antlr-4.13.1-complete.jar
  2. lexer.g4/parser.g4: Basically equal to github.com/antlr/grammars-v4/tree/master/sql/mysql/Oracle, Only three lines were added to handle situations where a string contains multiple SQL statements:
queryMulti
    : ((simpleStatement | beginWork) SEMICOLON_SYMBOL?)* EOF
    ;
  1. test sql is:
SELECT * FROM table_a AS a INNER JOIN table_b AS b ON a.field_1=b.field_2
  1. i added error listener and get:
15:18:48 Error: no viable alternative at input 'SELECT * FROM table_a AS a INNER JOIN table_b AS b ON aS.field_1' - Loc[1(55)] - Antlr4.Runtime.NoViableAltException: Exception of type 'Antlr4.Runtime.NoViableAltException' was thrown.
   at Antlr4.Runtime.Atn.ParserATNSimulator.ExecATN(DFA dfa, DFAState s0, ITokenStream input, Int32 startIndex, ParserRuleContext outerContext)
   at Antlr4.Runtime.Atn.ParserATNSimulator.AdaptivePredict(ITokenStream input, Int32 decision, ParserRuleContext outerContext)
   at WPF_DbTools.Antlr.MySQLParser.selectStatement() in dir\MySQLParser.cs:line 15862
exception:"Antlr4.Runtime.NoViableAltException"(in Antlr4.Runtime.Standard.dll)
  1. my csharp code like:
string sql = "SELECT * FROM table_a AS a INNER JOIN table_b AS b ON a.field_1=b.field_2";

ICharStream charStream = CharStreams.fromString(sql);
ITokenSource lexer = new MySQLLexer(charStream);
ITokenStream stream = new CommonTokenStream(lexer);
MySQLParser parser = new MySQLParser(stream);
parser.AddErrorListener(new MySQLParserErrorListener());
var treeAll = parser.queryMulti();

I found that the SQL in the exception message contains an extra 'S' character compared to the original text I parsed. Is this the reason for the parsing failure? I want to know how to fix this problem


Solution

  • There are several problems with your program.

    First, you should not be running the Antlr Tool manually. It is bad practice to not use a build tool to handle the build dependencies. Currently, you first run the Antlr4 Tool to generate the parser, lexer, listener, and visitor, as stated in cmd.txt: java -jar antlr-4.13.1-complete.jar -Dlanguage=CSharp -visitor -package WPF_DbTools.Antlr -o ../Antlr MySQLParser.g4 MySQLLexer.g4. Then, you manually have to copy the generated files from ../Antlr/ to ., which cmd.txt does not say. Further, the generated files are checked in, which is something you should never do.

    I recommend that you: (a) remove all generated files from the repo; (b) use Antlr4BuildTasks and clean up this build. Antlr4BuildTasks downloads a JRE, the Antlr Tool jar, and runs the tool completely behind the scene. Antlr4BuildTasks matches the version of the runtime with the version of the tool, another source of errors. MySQLParserDemo.csproj should be this:

    <Project Sdk="Microsoft.NET.Sdk">
        <PropertyGroup>
            <OutputType>Exe</OutputType>
            <TargetFramework>net8.0</TargetFramework>
            <ImplicitUsings>enable</ImplicitUsings>
        </PropertyGroup>
        <ItemGroup>
            <Antlr4 Include="MySQLLexer.g4">
                <AntOutDir>$(ProjectDir)</AntOutDir>
                <Package>WPF_DbTools.Antlr</Package>
            </Antlr4>
            <Antlr4 Include="MySQLParser.g4">
                <AntOutDir>$(ProjectDir)</AntOutDir>
                <Package>WPF_DbTools.Antlr</Package>
            </Antlr4>
        </ItemGroup>
        <ItemGroup>
            <PackageReference Include="Antlr4.Runtime.Standard" Version="4.13.1" />
            <PackageReference Include="Antlr4BuildTasks" version="12.8" PrivateAssets="all" />
        </ItemGroup>
    </Project>
    

    I removed the compile with nullable option because the Antlr4 runtime is not compiled with this option. With <Nullable>enable</Nullable> the build gives warnings for the lexer constructors.

    The parse error is:

    line 1:55 no viable alternative at input 'SELECT * FROM table_a AS A INNER JOIN table_b AS B ON aS.field_1'
    no viable alternative at input 'SELECT * FROM table_a AS A INNER JOIN table_b AS B ON aS.field_1' - Loc[1(55)] - Antlr4.Runtime.NoViableAltException: Exception of type 'Antlr4.Runtime.NoViableAltException' was thrown.
    

    The error occurs at .field_1 in the input.

    To check that the tokens produced by the CSharp port are equal to that for the Antlr4ng, you need to change your program:

    using Antlr4.Runtime;
    using WPF_DbTools.Antlr;
    
    var sql = "SELECT * FROM table_a AS a INNER JOIN table_b AS b ON a.field_1=b.field_2"; // Make sure the input is the same as posted in the SO question!!!!!!!!!!!!!!!!!
    var charStream = CharStreams.fromString(sql);
    var lexer = new MySQLLexer(charStream);
    var stream = new CommonTokenStream(lexer);
    var parser = new MySQLParser(stream);
    parser.RemoveErrorListeners();
    parser.AddErrorListener(new MySQLParserErrorListener());
    for (int i = 0; ; ++i)
    {
        var ro_token = lexer.NextToken();
        var token = (CommonToken)ro_token;
        token.TokenIndex = i;
        System.Console.WriteLine(token.ToString());
        if (token.Type == Antlr4.Runtime.TokenConstants.EOF)
            break;
    }
    lexer.Reset();
    var tree = parser.queryMulti();
    

    Antlr4ng:

    [@0,0:5='SELECT',<511>,1:0]
    [@1,6:6=' ',<817>,channel=1,1:6]
    [@2,7:7='*',<16>,1:7]
    [@3,8:8=' ',<817>,channel=1,1:8]
    [@4,9:12='FROM',<235>,1:9]
    [@5,13:13=' ',<817>,channel=1,1:13]
    [@6,14:20='table_a',<820>,1:14]
    [@7,21:21=' ',<817>,channel=1,1:21]
    [@8,22:23='AS',<63>,1:22]
    [@9,24:24=' ',<817>,channel=1,1:24]
    [@10,25:25='a',<820>,1:25]
    [@11,26:26=' ',<817>,channel=1,1:26]
    [@12,27:31='INNER',<271>,1:27]
    [@13,32:32=' ',<817>,channel=1,1:32]
    [@14,33:36='JOIN',<291>,1:33]
    [@15,37:37=' ',<817>,channel=1,1:37]
    [@16,38:44='table_b',<820>,1:38]
    [@17,45:45=' ',<817>,channel=1,1:45]
    [@18,46:47='AS',<63>,1:46]
    [@19,48:48=' ',<817>,channel=1,1:48]
    [@20,49:49='b',<820>,1:49]
    [@21,50:50=' ',<817>,channel=1,1:50]
    [@22,51:52='ON',<401>,1:51]
    [@23,53:53=' ',<817>,channel=1,1:53]
    [@24,54:54='a',<820>,1:54]
    [@25,55:55='.field_1',<28>,1:63]
    [@26,56:62='field_1',<820>,1:55]
    [@27,63:63='=',<6>,1:64]
    [@28,64:64='b',<820>,1:65]
    [@29,65:65='.field_2',<28>,1:74]
    [@30,66:72='field_2',<820>,1:66]
    [@31,73:72='<EOF>',<-1>,1:75]
    

    CSharp:

    [@0,0:5='SELECT',<511>,1:0]
    [@1,6:6=' ',<817>,channel=1,1:6]
    [@2,7:7='*',<16>,1:7]
    [@3,8:8=' ',<817>,channel=1,1:8]
    [@4,9:12='FROM',<235>,1:9]
    [@5,13:13=' ',<817>,channel=1,1:13]
    [@6,14:20='table_a',<820>,1:14]
    [@7,21:21=' ',<817>,channel=1,1:21]
    [@8,22:23='AS',<63>,1:22]
    [@9,24:24=' ',<817>,channel=1,1:24]
    [@10,25:25='a',<820>,1:25]
    [@11,26:26=' ',<817>,channel=1,1:26]
    [@12,27:31='INNER',<271>,1:27]
    [@13,32:32=' ',<817>,channel=1,1:32]
    [@14,33:36='JOIN',<291>,1:33]
    [@15,37:37=' ',<817>,channel=1,1:37]
    [@16,38:44='table_b',<820>,1:38]
    [@17,45:45=' ',<817>,channel=1,1:45]
    [@18,46:47='AS',<63>,1:46]
    [@19,48:48=' ',<817>,channel=1,1:48]
    [@20,49:49='b',<820>,1:49]
    [@21,50:50=' ',<817>,channel=1,1:50]
    [@22,51:52='ON',<401>,1:51]
    [@23,53:53=' ',<817>,channel=1,1:53]
    [@24,54:54='a',<820>,1:54]
    [@25,0:0='S',<28>,1:63]
    [@26,55:62='.field_1',<0>,1:55]
    [@27,63:63='=',<6>,1:64]
    [@28,64:64='b',<820>,1:65]
    [@29,1:1='E',<28>,1:74]
    [@30,65:72='.field_2',<0>,1:66]
    [@31,73:72='<EOF>',<-1>,1:75]
    

    The tokens between the ports are not identical. NB: make sure to check the token type first, then the token text. Your code does not work because you are not rewriting the Antlr4 Runtime calls in the .g4s. You could add override properties Type and Text in MySQLBaseLexer, but your code and Lexer both directly access two different backing fields which defeats the whole point of wrapping the field with a property. Change this.type to this.Type and this.text to this.Text in the MySQLLexer.g4, and remove your overrides for Type and Text in MySQLBaseLexer, then the code works. (NB: The original Antlr4ng port does produce the right token type for token number 25 (the type is 28), but it does not rewrite the text to be ., but is instead .field_1.)

    The problem with Antlr4 is that the grammars are not target independent. Antlr4 grammars can contain "actions", e.g., { this.type = this.isSqlModeActive(SqlMode.PipesAsConcat) ? MySQLLexer.CONCAT_PIPES_SYMBOL : MySQLLexer.LOGICAL_OR_OPERATOR; }, which is TypeScript source code. This code must be rewritten for a port to CSharp. The difficulty is that the Anrlr4 runtime API is not identical between ports. So, you need to match up the fields, methods, and properties between the ports and make sure to use the correct API call.

    Ideally, an Antlr4 grammar should be "target agnostic", which tries to write actions in a syntactically neutral target. In almost all targets, a function or method call is identical (an identifier followed by ()). You can read about "target agnostic" here: https://github.com/antlr/antlr4/blob/dev/doc/target-agnostic-grammars.md

    Your fixed program is here: https://github.com/kaby76/MySqlParser