oracle-databaseparsingantlr4antlrworks

Lexer rule is recognized where it wasn't needed


trying to use ANTLR 4 to create a simple grammar for some Select statements in Oracle DB. And faced a small problem. I have the following grammar:

Grammar & Lexer

column
: (tableAlias '.')? IDENT ((AS)? colAlias)?
| expression ((AS)? colAlias)?
| caseWhenClause ((AS)? colAlias)?
| rankAggregate ((AS)? colAlias)?
| rankAnalytic colAlias
;

colAlias
: '"' IDENT '"'
| IDENT
;

rankAnalytic
: RANK '(' ')' OVER '(' queryPartitionClause orderByClause ')'
;

RANK: R A N K;
fragment A:('a'|'A');
fragment N:('n'|'N');
fragment R:('r'|'R');
fragment K:('k'|'K');

The most important part there is in COLUMN declaration rankAnalytic part. I declared that after Rank statement should be colAlias, but in case this colAlias is called like "rank" (without quotes) it's recognized as a RANK lexer rule, but not as colAlias.

So for example in case I have the following text:

 SELECT fulfillment_bundle_id, SKU, SKU_ACTIVE, PARENT_SKU, SKU_NAME, LAST_MODIFIED_DATE,
 RANK() over (PARTITION BY fulfillment_bundle_id, SKU, PARENT_SKU 
 order by ACTIVE DESC NULLS LAST,SKU_NAME) rank

"rank" alias will be underlined and marked as an mistake with the following error:
mismatched input 'rank' expecting {'"', IDENT}
But the point is that I don't want it to be recognized as a RANK lexer word, but only rank as an alias for Column.
Open for your suggestions :)


Solution

  • The RANK rule apparently appears above the IDENT rule, so the string "rank" will never be emitted by the lexer as an IDENT token.

    A simple fix is to change the colAlias rule:

    colAlias
        : '"' ( IDENT | RANK ) '"'
        | ( IDENT | RANK ) 
        ;
    

    OP added:

    Ok but in case I have not only RANK as a lexer rule but the whole list (>100) of such key words... What am I supposed to do?

    If colAlias can be literally anything, then let it:

    colAlias
        : '"' .+? '"'    // must quote if multiple
        | .              // one token
        ;
    

    If that definition would incur ambiguities, a predicate is needed to qualify the match:

    colAlias
        : '"' m+=.+? '"' { check($m) }?  // multiple
        | o=.            { check($o) }?  // one 
        ;
    

    Functionally, the predicate is just another element in the subrule.