parsingrustsoliditylexerpest

Building a Solidity parser in Rust, hitting an `expression can not fail` and `recursion` error


I am building a Solidity parser with Rust. I am using the Pest Parser crate and am setting up my grammar.pest file to be very similar to the Solidity repo's Lexer/Parser.

I am hitting two errors. The first is an error saying:

            |
          41 | callArgumentList = {LParen ~ ((expression ~ (Comma ~ expression)*)? | LBrace ~ (namedArgument ~ (Comma ~ namedArgument)*)? ~ RBrace) ~ RParen␊
             |                               ^-----------------------------------^
             |
             = expression cannot fail; following choices cannot be reached

This error is showing for this rule defined in my grammar.pest.

callArgumentList = {LParen ~ ((expression ~ (Comma ~ expression)*)? | LBrace ~ (namedArgument ~ (Comma ~ namedArgument)*)? ~ RBrace) ~ RParen
    }

The above syntax is trying to replicate this rule in the Solidity official parser.

I believe that I have my syntax to match the Solidity repo's syntax for this rule but I am not sure what is going on here.

Secondly, I am getting a recursive error.

        --> 131:6
              |
          131 |     (expression ~ LBrack ~ expression? ~ RBrack)␊
              |      ^--------^
              |
              = rule expression is left-recursive (expression -> expression); pest::prec_climber might be useful in this case
    

This second error is due to the grammar for an expression, however it must be recursive. Here is the rule from the Solidity Repo.

Below is my grammar.pest implementation.

expression = {
    (expression ~ LBrack ~ expression? ~ RBrack)
    | (expression ~ LBrack ~ expression? ~ Colon ~ expression? ~ RBrack)
    | (expression ~ Period ~ (identifier | Address))
    | (expression ~ LBrack ~ (namedArgument ~ (Comma ~ namedArgument)*)? ~ RBrack)
    | (expression ~ callArgumentList)
    | (Payable ~ callArgumentList)
    | (Type ~ LParen ~ typeName ~ RParen)
    | ((Inc | Dec | Not | BitNot | Delete | Sub) ~ expression)
    | (expression ~ (Inc | Dec))
    | (expression ~ Exp ~ expression)
    | (expression ~ (Mul | Div | Mod) ~ expression)
    | (expression ~ (Add | Sub) ~ expression)
    | (expression ~ (Shl | Sar | Shr) ~ expression)
    | (expression ~ BitAnd ~ expression)
    | (expression ~ BitXor ~ expression)
    | (expression ~ BitOr ~ expression)
    | (expression ~ (LessThan | GreaterThan | LessThanOrEqual | GreaterThanOrEqual) ~ expression)
    | (expression ~ (Equal | NotEqual) ~ expression)
    | (expression ~ And ~ expression)
    | (expression ~ Or ~ expression)
    | (expression ~ Conditional ~ expression ~ Colon ~ expression)
    | (expression ~ assignOp ~ expression)
    | (New ~ typeName)
    | (tupleExpression)
    | (inlineArrayExpression)
}

Can someone help me figure out how to fix these two errors?


Solution

  • The easiest fix for direct left recursion is to do the usual refactoring for operator precedence by inserting rules and ordering according to precedence. For example (with many alts removed to simplify):

    expression = { array }
    array = { arraycolon ~ ( LBrack ~ expression? ~ RBrack ) * }
    arraycolon = { qualified ~ ( LBrack ~ expression? ~ Colon ~ expression? ~ RBrack ) * }
    qualified = { arraycomma ~ ( Period ~ identifier ) * }
    arraycomma = { multexpr ~ ( LBrack ~ (namedArgument ~ (Comma ~ namedArgument)*)? ~ RBrack ) * }
    multexpr = { addexpr ~ ( (Mul | Div | Mod) ~ expression ) * }
    addexpr = { andexpr ~ ( (Add | Sub) ~ expression ) * }
    andexpr = { orexpr ~ ( And ~ expression ) * }
    orexpr = { identifier ~ ( Or ~ expression ) * }
    namedArgument = { identifier ~ Colon ~ expression }
    LBrack = @{ "[" }
    RBrack = @{ "]" }
    Colon = @{ ":" }
    Period = @{ "." }
    Comma = @{ "," }
    Mul = @{ "*" }
    Div = @{ "/" }
    Mod = @{ "%" }
    Add = @{ "+" }
    Sub = @{ "-" }
    And = @{ "&&" }
    Or = @{ "||" }
    identifier = { (ASCII_ALPHANUMERIC | "_" | "-")+ }
    WHITESPACE = _{ " " | "\t" | NEWLINE }
    COMMENT    = _{ "#" ~ (!NEWLINE ~ ANY)* }
    

    You can also do it using the usual refactoring that is in the Dragon book. Or, you can even use the native Pest feature to specify operator precedence.

    The other problem is caused because the ?-operator is in the wrong place in ( expression ~ ( Comma expression ) * )?. A similar problem is discussed here. The solution is a refactoring to remove ?-operator, the reintroduce it in the correct place:

    x : l (a? | b) r;
       ==> (eliminate ?-operator, useless parentheses)
    x : l ( | a | b) r;
       ==> (add ?-operator)
    x : l ( a | b )? r ;
    

    Using this and regrouping, a possible solution is:

    callArgumentList = { LParen ~ ( ( expression ~ ( Comma ~ expression ) * ) | ( LBrace ~ ( namedArgument ~ ( Comma ~ namedArgument ) * ) ? ~ RBrace ) )? ~ RParen }