compiler-construction antlr antlr4 context-free-grammar ebnf

How can I tell ANTLR to use only the branch I specify, not other branches。

Here is my ANTLR grammar： It is divided into two section ,parameters and constraints; The parameters section consists of many row,Each rowrepresents a parameter and its values.Each parameter and its values are separated by : . Each parameter value is separated by a ,.

The grammar of the constraints section was given by pict's github repository pict's github repository, I converted it into ANTLR grammar format.

grammar Pict;
model:parameters? constraints?;
//The part of Parameters and Values of Parameters
parameters:parameterRow+ '\n'*;
parameterRow: ' '* parameterName  SEMI  parameterValue (',' ' '* parameterValue)* '\n'*;
parameterName: Value ;
parameterValue:NUMBER|Value;

//The part of submodel
//submodel:;

//The part of constraints
constraints: constraint+ '\n'*;
constraint:(predicate ';'? '\n'*)|((IF|IFNOT) predicate THEN predicate (ELSE predicate)?) ';'? '\n'*;
predicate:
clause
|(clause LogicalOperator predicate)
;
clause:term
|'(' ' '* predicate ' '* ')'
|NOT predicate
;

term:
'['parameterName']' ' '* IN ' '*  '{' ' '* (String|NUMBER) ' '* (',' ' '* (NUMBER|String))* ' '* '}' #inStatment
|'['parameterName']' ' '* Relation ' '* (NUMBER|String) #relationValueStatement
| '['parameterName']' ' '* LIKE' '*  (NUMBER|String) #likeStatement
|'['parameterName']' ' '* Relation ' '* '['parameterName']'#relationParaStatement
;




SEMI:[ ]*':'[ ]* {setText(getText().trim());};
IN: ([ ]* 'in' [ ]* | [ ]* 'IN' [ ]*) {setText(getText().trim());};
LIKE:([ ]* ('LIKE'|'like') [ ]*) {setText(getText().trim());};
Relation:  ('='|'<>'|'>'|'>='|'<'|'<=' ) {setText(getText().trim());};
IF:[ '\n']* ('IF'|'if') [ '\n']*;
IFNOT:[ '\n']* ('IF NOT'|'if not') [ '\n']*;
THEN:[ '\n']* ('THEN'|'then') [ '\n']*;
ELSE:[ '\n']* ('ELSE'|'else') [ '\n']*;
NOT:[ '\n']* ('NOT'|'not') [ '\n']*;
LogicalOperator:([ '\n']* ('and'|'AND') [ '\n']*)|([ '\n']* ('OR'|'or') [ '\n']*) {setText(getText().trim());};

NUMBER
    :   '-'? INT '.' INT EXP?   // 1.35, 1.35E-9, 0.3, -4.5
    |   '-'? INT EXP            // 1e10 -3e4
    |   '-'? INT             // -3, 45
    ;

Value:LETTERNoWhiteSpace[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*(' ')?[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*{setText(getText().trim());};



String:('"' .*? '"') {setText(getText().trim());};
WS:[ \t\r\n]+ -> skip ;
COMMENT: '#' .*? '\n' ->skip;
fragment INT :   '0' | '1'..'9' '0'..'9'* ; // no leading zeros
fragment EXP :   [Ee] [+\-]? INT ; // \- since - means "range" inside [...]
fragment
LETTERNoWhiteSpace:[-a-zA-Z\u4e00-\u9fa5_0-9];

For the lexical rule Value ,I need it to match all English and Chinese, as well as all English punctuation and Chinese punctuation,So I used unicode,start with \u to do it.

My input is:

Size:  1, 2, 3, 4, 5
Value: a, b, c, d

IF [Size] > 3 THEN [Value] > "b";

and ANTLR reports that:

line 4:12 no viable alternative at input '[Size] > 3 THEN'

Syntax Tree right here

I found that 3 THEN is matched by lexical rule Value,but I want 3 to be matched by rule Number or String like my grammar above ,and THEN is a keyword,it should not be matched.

How can I change my grammar to solve this problem?Thanks!

Solution

It's probably going to help to clean things up a bit (will make things easier to digest).

Most obvious: You have a WS rule with a skip action so you can drop all of the [ ]* (and similar) stuff. This also means you don't need the {setText(getText().trim());} stuff.
You can use options { caseInsensitive = true; } to avoid things like IF: ('IF' | 'if');
a | in a set ([abd|c]) is the actual | character, not an or operator. so you don't want stuff like \uff0c|\u3001|\uff1b|\uff1a (should be \uff0c\u3001\uff1b\uff1a)

This gives you:

grammar Pict
    ;

options {
    caseInsensitive = true;
}

model: parameterRow* constraint*;
//The part of Parameters and Values of Parameters parameters: parameterRow;
parameterRow
    : parameterName COLON parameterValue (',' parameterValue)*
    ;
parameterName:  Value;
parameterValue: NUMBER | Value;

//The part of submodel submodel:;

//The part of constraints constraints: constraint+;
constraint
    : predicate ';'?
    | (IF | IFNOT) predicate THEN predicate (ELSE predicate)? ';'?
    ;
predicate: clause | (clause LogicalOperator predicate);
clause:    term | '(' predicate ')' | NOT predicate;

term
    : '[' parameterName ']' IN ' {' (String | NUMBER) (
        ',' (NUMBER | String)
    )* '}'                                                 # inStatment
    | '[' parameterName ']' Relation (NUMBER | String)     # relationValueStatement
    | '[' parameterName ']' LIKE (NUMBER | String)         # likeStatement
    | '[' parameterName ']' Relation '[' parameterName ']' # relationParaStatement
    ;

COLON: ':';
IN:    'in';

LIKE:     'like';
Relation: ('=' | '<>' | '>' | '>=' | '<' | '<=');

IF:              'if';
IFNOT:           'if not';
THEN:            'then';
ELSE:            'else';
NOT:             'not';
LogicalOperator: ('and' | 'or');

NUMBER
    : '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5
    | '-'? INT EXP // 1e10 -3e4
    | '-'? INT // -3, 45
    ;

Value
    : LETTERNoWhiteSpace
        [-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
        (
        ' '?
            [-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
    )*
    ;

String:       ('"' .*? '"') {setText(getText().trim());};
WS:           [ \t\r\n]+   -> skip;
COMMENT:      '#' .*? '\n' -> skip;
fragment INT: '0' | '1' ..'9' '0' ..'9'*; // no leading zeros
fragment EXP
    : 'e' [+\-]? INT
    ; // \- since - means "range" inside [...]
fragment LETTERNoWhiteSpace: [a-z\u4e00-\u9fa5_0-9];

With the following errors for your input...

line 2:7 token recognition error at: 'a,'
line 2:10 token recognition error at: 'b,'
line 2:13 token recognition error at: 'c,'
line 2:16 token recognition error at: 'd\n'
line 4:0 missing {NUMBER, Value} at 'IF'

so we can see that your Value rule doesn't recognize single letter values. If you modify it it to:

Value
    : LETTERNoWhiteSpace (
        [-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
            (
            ' '?
                [-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
        )*
    )?
    ;

(Note: This rule is quite complex, and, by allowing embedded spaces, is likely to cause some problems with tokenization in more complex examples than yours, but it works fine for your sample input.)

Then there are no errors and you get the following tree: