I'm working on a small translator in JISON, but I've run into a problem when trying to implement the cast of expressions, since it generates an ambiguity in the grammar when trying to add the production of cast. I need to add the productions to the cast option, so in principle I should have something like this:
expr: OPEN_PAREN type CLOSE_PAREN expr
However, since in my grammar I must be able to have expressions in parentheses, I already have the following production, so the grammar is now ambiguous:
expr: '(' expr ')'
Initially I had the following grammar for expressions:
expr : expr PLUS expr
| expr MINUS expr
| expr TIMESexpr
| expr DIV expr
| expr MOD expr
| expr POWER expr
| MINUS expr %prec UMINUS
| expr LESS_THAN expr
| expr GREATER_THAN expr
| expr LESS_OR_EQUAL expr
| expr GREATER_OR_EQUAL expr
| expr EQUALS expr
| expr DIFFERENT expr
| expr OR expr
| expr AND expr
| NOT expr
| OPEN_PAREN expr CLOSE_PAREN
| INT_LITERAL
| DOUBLE_LITERAL
| BOOLEAN_LITERAL
| CHAR_LITERAL
| STRING_LTIERAL
| ID;
Ambiguity was handled by applying the following precedence and associativity rules:
%left 'ASSIGNEMENT'
%left 'OR'
%left 'AND'
%left 'XOR'
%left 'EQUALS', 'DIFFERENT'
%left 'LESS_THAN ', 'GREATER_THAN ', 'LESS_OR_EQUAL ', 'GREATER_OR_EQUAL '
%left 'PLUS', 'MINUS'
%left 'TIMES', 'DIV', 'MOD'
%right 'POWER'
%right 'UMINUS', 'NOT'
I can't find a way to write a production that allows me to add the cast without falling into an ambiguity. Is there a way to modify this grammar without having to write an unambiguous grammar? Is there a way I can resolve this issue using JISON, which I may not have been able to see?
Any ideas are welcome.
This is what I was trying, however it's still ambiguous:
expr: OPEN_PAREN type CLOSE_PAREN expr
| OPEN_PAREN expr CLOSE_PAREN
The problem is that you don't specify the precedence of the cast operator, which is effectively a unary operator whose precedence should be the same as any other unary operator, such as NOT
. (See below for a discussion of UMINUS
.)
The parsing conflicts you received are not related to the fact that expr: '(' expr ')'
is also a production. That would prevent LL(1) parsing, because the two productions start with the same sequence, but that's not an ambiguity. It doesn't affect bottom-up parsing in any way; the two productions are unambiguously recognisable.
Rather, the conflicts are the result of the parser not knowing whether (type)a+b
means ((type)a+b
or (type)(a+b)
, which is no different from the ambiguity of unary minus (should -a/b
be parsed as (-a)/b
or -(a/b)
?), which is resolved by putting UMINUS
at the end of the precedence list.
In the case of casts, you don't need to use a %prec
declaration with a pseudo-token; that's only necessary for -
because -
could also be a binary operator, with a different (reduction) precedence. The precedence of the production:
expr: '(' type ')' expr
is )
(at least in yacc/bison), because that's the last terminal in the production. There's no need to give )
a shift precedence, because the grammar requires it to always be shifted.
Three notes:
Assignment is right-associative. a = b = 3
means a = (b = 3)
, not (a = b) = 3
.
In the particular case of unary minus (and, by extension, unary plus if you feel like implementing it), there's a good argument for putting it ahead of exponentiation, so that -a**b
is parsed as -(a**b)
. But that doesn't mean you should move other unary operators up from the end; (type)a**b
should be parsed as ((type)a)**b
. Nothing says that all unary operators have to have the same precedence.
When you add postfix operators -- notably function calls and array subscripts -- you will want to put them after the unary prefix operators. -a[3]
most certainly does not mean (-a)[3]
. These postfix operators are, in a way, duals of the prefix operators. As noted above, expr: '(' type ')' expr
has precedence ')'
, which is only used as a reduction precedence. Conversely, expr: expr '(' expr-list ')'
does not require a reduction precedence; the relevant token whose shift precedence needs to be declared is (
.
So, according to all the above, your precedence declarations might be:
%right ASSIGNMENT
%left OR
%left AND
%left XOR
%left EQUALS DIFFERENT
%left LESS_THAN GREATER_THAN LESS_OR_EQUAL GREATER_OR_EQUAL
%left PLUS MINUS
%left TIMES DIV MOD
%right UMINUS
%right POWER
%right NOT CLOSE_PAREN
%right OPEN_PAREN OPEN_BRACKET
I listed all the unary operators using right associativity, which is somewhat arbitrary; either %left
or %right
would have the same effect, since it is impossible for a unary operator to compete with another instance of the same operator for the same operand; for unary operators, only the precedence level makes any difference. But it's customary to mark unary operators with %right
.
Bison allows the use of %precedence
to declare precedence levels for operators which have no associativity, but Jison doesn't have that feature. Both Bison and Jison do allow the use of %nonassoc
, but that's very different: it says that it is a syntax error if either operand to the operator is an application of the same operator. That restriction is, for example, sometimes applied to comparison operators, in order to make a < b < c
a syntax error.