antlrantlr4

How to express a "call form" expr syntax like 'func arg arg' in Antlr4?


Consider my simple grammar, skipping the lexer here:

expr:
    expr expr+                           # CallFormExpr // ⬅ the question here!
    | L_PAREN expr R_PAREN               # ParensExpr
    | L_BRACKET (expr COMMA?)* R_BRACKET # SquareBracketExpr
    | L_CURLY (expr COMMA?)* R_CURLY     # CurlyBracesExpr
    | lit                                # LitExpr
    | ident                              # IdentExpr
;

ident : IDENTIFIER | OPERATOR;

lit:
    RUNE_LIT
    | RAW_STRING_LIT
    | INTERPRETED_STRING_LIT
    | IMAGINARY_LIT
    | FLOAT_LIT
    | DECIMAL_LIT
    | BINARY_LIT
    | OCTAL_LIT
    | HEX_LIT
;

Now, the very first of exprs | options, expr expr+, is meant to capture foo bar baz as an expr with 3 idents but Antlr4 Lab gives me this parse tree which is kind of akin to foo (bar baz):

tree instead of flat list

The prob: basically, instead of this..

..I'd want to instead arrive at this:

Due to recursion, the "flat list gets treeified". Is it a matter of how exprs | options are ordered? Or is there some Antlr "hinting syntax" to "keep it flat"?


Solution

  • Because expr expr+ is inside expr itself, you'll always get the (nested) parse tree you observe. To get around it, you'll need to remove the CallFormExpr from expr. Something like this will do the trick:

    parse:
        call_form_expr EOF
    ;
    
    call_form_expr:
        expr expr+                           # CallFormExpr
        | expr                               # NormalExpr
    ;
    
    expr:
        L_PAREN expr R_PAREN                 # ParensExpr
        | L_BRACKET (expr COMMA?)* R_BRACKET # SquareBracketExpr
        | L_CURLY (expr COMMA?)* R_CURLY     # CurlyBracesExpr
        | lit                                # LitExpr
        | ident                              # IdentExpr
    ;