Here is my ANTLR grammar:
It is divided into two section ,parameters
and constraints
;
The parameters
section consists of many row,Each rowrepresents a parameter and its values.Each parameter and its values are separated by :
. Each parameter value is separated by a ,
.
The grammar of the constraints section was given by pict's github repository pict's github repository, I converted it into ANTLR grammar format.
grammar Pict;
model:parameters? constraints?;
//The part of Parameters and Values of Parameters
parameters:parameterRow+ '\n'*;
parameterRow: ' '* parameterName SEMI parameterValue (',' ' '* parameterValue)* '\n'*;
parameterName: Value ;
parameterValue:NUMBER|Value;
//The part of submodel
//submodel:;
//The part of constraints
constraints: constraint+ '\n'*;
constraint:(predicate ';'? '\n'*)|((IF|IFNOT) predicate THEN predicate (ELSE predicate)?) ';'? '\n'*;
predicate:
clause
|(clause LogicalOperator predicate)
;
clause:term
|'(' ' '* predicate ' '* ')'
|NOT predicate
;
term:
'['parameterName']' ' '* IN ' '* '{' ' '* (String|NUMBER) ' '* (',' ' '* (NUMBER|String))* ' '* '}' #inStatment
|'['parameterName']' ' '* Relation ' '* (NUMBER|String) #relationValueStatement
| '['parameterName']' ' '* LIKE' '* (NUMBER|String) #likeStatement
|'['parameterName']' ' '* Relation ' '* '['parameterName']'#relationParaStatement
;
SEMI:[ ]*':'[ ]* {setText(getText().trim());};
IN: ([ ]* 'in' [ ]* | [ ]* 'IN' [ ]*) {setText(getText().trim());};
LIKE:([ ]* ('LIKE'|'like') [ ]*) {setText(getText().trim());};
Relation: ('='|'<>'|'>'|'>='|'<'|'<=' ) {setText(getText().trim());};
IF:[ '\n']* ('IF'|'if') [ '\n']*;
IFNOT:[ '\n']* ('IF NOT'|'if not') [ '\n']*;
THEN:[ '\n']* ('THEN'|'then') [ '\n']*;
ELSE:[ '\n']* ('ELSE'|'else') [ '\n']*;
NOT:[ '\n']* ('NOT'|'not') [ '\n']*;
LogicalOperator:([ '\n']* ('and'|'AND') [ '\n']*)|([ '\n']* ('OR'|'or') [ '\n']*) {setText(getText().trim());};
NUMBER
: '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5
| '-'? INT EXP // 1e10 -3e4
| '-'? INT // -3, 45
;
Value:LETTERNoWhiteSpace[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*(' ')?[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*{setText(getText().trim());};
String:('"' .*? '"') {setText(getText().trim());};
WS:[ \t\r\n]+ -> skip ;
COMMENT: '#' .*? '\n' ->skip;
fragment INT : '0' | '1'..'9' '0'..'9'* ; // no leading zeros
fragment EXP : [Ee] [+\-]? INT ; // \- since - means "range" inside [...]
fragment
LETTERNoWhiteSpace:[-a-zA-Z\u4e00-\u9fa5_0-9];
For the lexical rule Value
,I need it to match all English and Chinese, as well as all English punctuation and Chinese punctuation,So I used unicode,start with \u
to do it.
My input is:
Size: 1, 2, 3, 4, 5
Value: a, b, c, d
IF [Size] > 3 THEN [Value] > "b";
and ANTLR reports that:
line 4:12 no viable alternative at input '[Size] > 3 THEN'
I found that 3 THEN
is matched by lexical rule Value
,but I want 3
to be matched by rule Number
or String
like my grammar above ,and THEN
is a keyword,it should not be matched.
How can I change my grammar to solve this problem?Thanks!
It's probably going to help to clean things up a bit (will make things easier to digest).
Most obvious: You have a WS
rule with a skip
action so you can drop all of the [ ]*
(and similar) stuff. This also means you don't need the {setText(getText().trim());}
stuff.
You can use options { caseInsensitive = true; }
to avoid things like IF: ('IF' | 'if');
a |
in a set ([abd|c]
) is the actual |
character, not an or
operator. so you don't want stuff like \uff0c|\u3001|\uff1b|\uff1a
(should be \uff0c\u3001\uff1b\uff1a
)
This gives you:
grammar Pict
;
options {
caseInsensitive = true;
}
model: parameterRow* constraint*;
//The part of Parameters and Values of Parameters parameters: parameterRow;
parameterRow
: parameterName COLON parameterValue (',' parameterValue)*
;
parameterName: Value;
parameterValue: NUMBER | Value;
//The part of submodel submodel:;
//The part of constraints constraints: constraint+;
constraint
: predicate ';'?
| (IF | IFNOT) predicate THEN predicate (ELSE predicate)? ';'?
;
predicate: clause | (clause LogicalOperator predicate);
clause: term | '(' predicate ')' | NOT predicate;
term
: '[' parameterName ']' IN ' {' (String | NUMBER) (
',' (NUMBER | String)
)* '}' # inStatment
| '[' parameterName ']' Relation (NUMBER | String) # relationValueStatement
| '[' parameterName ']' LIKE (NUMBER | String) # likeStatement
| '[' parameterName ']' Relation '[' parameterName ']' # relationParaStatement
;
COLON: ':';
IN: 'in';
LIKE: 'like';
Relation: ('=' | '<>' | '>' | '>=' | '<' | '<=');
IF: 'if';
IFNOT: 'if not';
THEN: 'then';
ELSE: 'else';
NOT: 'not';
LogicalOperator: ('and' | 'or');
NUMBER
: '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5
| '-'? INT EXP // 1e10 -3e4
| '-'? INT // -3, 45
;
Value
: LETTERNoWhiteSpace
[-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
(
' '?
[-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
)*
;
String: ('"' .*? '"') {setText(getText().trim());};
WS: [ \t\r\n]+ -> skip;
COMMENT: '#' .*? '\n' -> skip;
fragment INT: '0' | '1' ..'9' '0' ..'9'*; // no leading zeros
fragment EXP
: 'e' [+\-]? INT
; // \- since - means "range" inside [...]
fragment LETTERNoWhiteSpace: [a-z\u4e00-\u9fa5_0-9];
With the following errors for your input...
line 2:7 token recognition error at: 'a,'
line 2:10 token recognition error at: 'b,'
line 2:13 token recognition error at: 'c,'
line 2:16 token recognition error at: 'd\n'
line 4:0 missing {NUMBER, Value} at 'IF'
so we can see that your Value
rule doesn't recognize single letter values. If you modify it it to:
Value
: LETTERNoWhiteSpace (
[-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
(
' '?
[-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
)*
)?
;
(Note: This rule is quite complex, and, by allowing embedded spaces, is likely to cause some problems with tokenization in more complex examples than yours, but it works fine for your sample input.)
Then there are no errors and you get the following tree: