I would like to parse two type of expression with boolean :
- the first would be an init expression with boolean like : init : false
- and the last one would be a derive expression with boolean like : derive : !express or (express and (amount >= 100))
My idea is to put semantic predicates in a set of rules, the goal is when I'm parsing a boolean expression beginning with the word 'init' then it has to go to only one alternative rule proposed who is boolliteral, the last alternative in boolExpression. And if it's an expression beginning with the word 'derive' then it could have access to all alternatives of boolExpression.
I know that I could make two type of boolExpression without semantic predicates like boolExpressionInit and boolExpressionDerive... But I would like to try with my idea if it's could work with a only one boolExpression with semantic predicates.
Here's my grammar
grammar TestExpression;
@header
{
package testexpressionparser;
}
@parser::members {
int vConstraintType;
}
/* SYNTAX RULES */
textInput : initDefinition
| derDefinition ;
initDefinition : t=INIT {vConstraintType = $t.type;} ':' boolExpression ;
derDefinition : t=DERIVE {vConstraintType = $t.type;} ':' boolExpression ;
boolExpression : {vConstraintType != INIT || vConstraintType == DERIVE}? boolExpression (boolOp|relOp) boolExpression
| {vConstraintType != INIT || vConstraintType == DERIVE}? NOT boolExpression
| {vConstraintType != INIT || vConstraintType == DERIVE}? '(' boolExpression ')'
| {vConstraintType != INIT || vConstraintType == DERIVE}? attributeName
| {vConstraintType != INIT || vConstraintType == DERIVE}? numliteral
| {vConstraintType == INIT || vConstraintType == DERIVE}? boolliteral
;
boolOp : OR | AND ;
relOp : EQ | NEQ | GT | LT | GEQT | LEQT ;
attributeName : WORD;
numliteral : intliteral | decliteral;
intliteral : INT ;
decliteral : DEC ;
boolliteral : BOOLEAN;
/* LEXICAL RULES */
INIT : 'init';
DERIVE : 'derive';
BOOLEAN : 'true' | 'false' ;
BRACKETSTART : '(' ;
BRACKETSTOP : ')' ;
BRACESTART : '{' ;
BRACESTOP : '}' ;
EQ : '=' ;
NEQ : '!=' ;
NOT : '!' ;
GT : '>' ;
LT : '<' ;
GEQT : '>=' ;
LEQT : '<=' ;
OR : 'or' ;
AND : 'and' ;
DEC : [0-9]* '.' [0-9]* ;
INT : ZERO | POSITIF;
ZERO : '0';
POSITIF : [1-9] [0-9]* ;
WORD : [a-zA-Z] [_0-9a-zA-Z]* ;
WS : (SPACE | NEWLINE)+ -> skip ;
SPACE : [ \t] ; /* Space or tab */
NEWLINE : '\r'? '\n' ; /* Carriage return and new line */
I except that the grammar would run successfully, but what i receive is : "error(119): TestExpression.g4::: The following sets of rules are mutually left-recursive [boolExpression]
1 error(s)
BUILD FAIL"
Apparently ANTLR4's support for (direct) left-recursion does not work when a predicate appears before a left-recursive rule invocation. So you can fix the error by moving the predicate after the first boolExpression
in the left-recursive alternatives.
That said, it seems like the predicates aren't really necessary in the first place - at least not in the example you've shown us (or the one before your edit as far as I could tell). Since a boolExpression
with the constraint type INIT
can apparently only match boolLiteral
, you can just change initDefinition
as follows:
initDefinition : t=INIT ':' boolLiteral ;
Then boolExpression
will always have the constraint type DERIVE
and no predicates are necessary anymore.
Generally, if you want to allow different alternatives in non-terminal x
based on whether it was invoked by y
or z
, you should simply have multiple versions of x
and then call one from y
and the other from z
. That's usually a lot less hassle than littering the code with actions and predicates.
Similarly it can also make sense to have a rule that matches more than it should and then detect illegal expressions in a later phase instead of trying to reject them at the syntax level. Specifically beginners often try to write grammars that only allow well-typed expressions (rejecting something like 1+true
with a syntax error) and that never works out well.