parsingcompiler-constructiongrammarcontext-free-grammarjison

Remove ambiguity in grammar for expression casting


I'm working on a small translator in JISON, but I've run into a problem when trying to implement the cast of expressions, since it generates an ambiguity in the grammar when trying to add the production of cast. I need to add the productions to the cast option, so in principle I should have something like this:

expr: OPEN_PAREN type CLOSE_PAREN expr

However, since in my grammar I must be able to have expressions in parentheses, I already have the following production, so the grammar is now ambiguous:

expr: '(' expr ')'

Initially I had the following grammar for expressions:

expr                :       expr PLUS expr
                    |       expr MINUS expr
                    |       expr TIMESexpr
                    |       expr DIV expr
                    |       expr MOD expr
                    |       expr POWER expr
                    |       MINUS expr %prec UMINUS
                    |       expr LESS_THAN expr
                    |       expr GREATER_THAN expr
                    |       expr LESS_OR_EQUAL expr
                    |       expr GREATER_OR_EQUAL expr
                    |       expr EQUALS expr
                    |       expr DIFFERENT expr
                    |       expr OR expr
                    |       expr AND expr
                    |       NOT expr
                    |       OPEN_PAREN expr CLOSE_PAREN
                    |       INT_LITERAL
                    |       DOUBLE_LITERAL
                    |       BOOLEAN_LITERAL
                    |       CHAR_LITERAL
                    |       STRING_LTIERAL
                    |       ID;

Ambiguity was handled by applying the following precedence and associativity rules:

%left                       'ASSIGNEMENT'
%left                       'OR'
%left                       'AND'
%left                       'XOR'
%left                       'EQUALS', 'DIFFERENT'
%left                       'LESS_THAN ', 'GREATER_THAN ', 'LESS_OR_EQUAL ',  'GREATER_OR_EQUAL '
%left                       'PLUS', 'MINUS'
%left                       'TIMES', 'DIV', 'MOD'
%right                      'POWER'
%right                      'UMINUS', 'NOT'

I can't find a way to write a production that allows me to add the cast without falling into an ambiguity. Is there a way to modify this grammar without having to write an unambiguous grammar? Is there a way I can resolve this issue using JISON, which I may not have been able to see?

Any ideas are welcome.

This is what I was trying, however it's still ambiguous:

expr: OPEN_PAREN type CLOSE_PAREN expr
    | OPEN_PAREN expr CLOSE_PAREN

Solution

  • The problem is that you don't specify the precedence of the cast operator, which is effectively a unary operator whose precedence should be the same as any other unary operator, such as NOT. (See below for a discussion of UMINUS.)

    The parsing conflicts you received are not related to the fact that expr: '(' expr ')' is also a production. That would prevent LL(1) parsing, because the two productions start with the same sequence, but that's not an ambiguity. It doesn't affect bottom-up parsing in any way; the two productions are unambiguously recognisable.

    Rather, the conflicts are the result of the parser not knowing whether (type)a+b means ((type)a+b or (type)(a+b), which is no different from the ambiguity of unary minus (should -a/b be parsed as (-a)/b or -(a/b)?), which is resolved by putting UMINUS at the end of the precedence list.

    In the case of casts, you don't need to use a %prec declaration with a pseudo-token; that's only necessary for - because - could also be a binary operator, with a different (reduction) precedence. The precedence of the production:

    expr: '(' type ')' expr
    

    is ) (at least in yacc/bison), because that's the last terminal in the production. There's no need to give ) a shift precedence, because the grammar requires it to always be shifted.

    Three notes:

    1. Assignment is right-associative. a = b = 3 means a = (b = 3), not (a = b) = 3.

    2. In the particular case of unary minus (and, by extension, unary plus if you feel like implementing it), there's a good argument for putting it ahead of exponentiation, so that -a**b is parsed as -(a**b). But that doesn't mean you should move other unary operators up from the end; (type)a**b should be parsed as ((type)a)**b. Nothing says that all unary operators have to have the same precedence.

    3. When you add postfix operators -- notably function calls and array subscripts -- you will want to put them after the unary prefix operators. -a[3] most certainly does not mean (-a)[3]. These postfix operators are, in a way, duals of the prefix operators. As noted above, expr: '(' type ')' expr has precedence ')', which is only used as a reduction precedence. Conversely, expr: expr '(' expr-list ')' does not require a reduction precedence; the relevant token whose shift precedence needs to be declared is (.

    So, according to all the above, your precedence declarations might be:

    %right          ASSIGNMENT
    %left           OR
    %left           AND
    %left           XOR
    %left           EQUALS DIFFERENT
    %left           LESS_THAN GREATER_THAN LESS_OR_EQUAL GREATER_OR_EQUAL 
    %left           PLUS MINUS
    %left           TIMES DIV MOD
    %right          UMINUS
    %right          POWER
    %right          NOT CLOSE_PAREN
    %right          OPEN_PAREN OPEN_BRACKET
    

    I listed all the unary operators using right associativity, which is somewhat arbitrary; either %left or %right would have the same effect, since it is impossible for a unary operator to compete with another instance of the same operator for the same operand; for unary operators, only the precedence level makes any difference. But it's customary to mark unary operators with %right.

    Bison allows the use of %precedence to declare precedence levels for operators which have no associativity, but Jison doesn't have that feature. Both Bison and Jison do allow the use of %nonassoc, but that's very different: it says that it is a syntax error if either operand to the operator is an application of the same operator. That restriction is, for example, sometimes applied to comparison operators, in order to make a < b < c a syntax error.