parsingcompiler-constructionshift-reduce-conflictsablecc

Sablecc shift/reduce conflicts on productions with identifiers


I'm trying to write a specification file for sablecc for a version of minipython (with postfix/prefix increment and decrement operators), and some productions naturally need to use identifiers, but i get these conflicts during parsing:

shift/reduce conflict in state [stack: TPrint TIdentifier *] on TPlusPlus in {
    [ PMultiplication = TIdentifier * ] followed by TPlusPlus (reduce),
    [ PPostfix = TIdentifier * TPlusPlus ] (shift)
}

shift/reduce conflict in state [stack: TPrint TIdentifier *] on TMinusMinus in {
    [ PMultiplication = TIdentifier * ] followed by TMinusMinus (reduce),
    [ PPostfix = TIdentifier * TMinusMinus ] (shift)
}

shift/reduce conflict in state [stack: TPrint TIdentifier *] on TLPar in {
    [ PFunctionCall = TIdentifier * TLPar PArglist TRPar ] (shift),
    [ PFunctionCall = TIdentifier * TLPar TRPar ] (shift),
    [ PMultiplication = TIdentifier * ] followed by TLPar (reduce)
}

shift/reduce conflict in state [stack: TPrint TIdentifier *] on TLBr in {
    [ PExpression = TIdentifier * TLBr PExpression TRBr ] (shift),
    [ PMultiplication = TIdentifier * ] followed by TLBr (reduce),
    [ PPostfix = TIdentifier * TLBr PExpression TRBr TMinusMinus ] (shift),
    [ PPostfix = TIdentifier * TLBr PExpression TRBr TPlusPlus ] (shift)
}
java.lang.RuntimeException:

I started by following a given bnf of the language and got to this. Here is the grammar file:

Productions
goal = {prgrm}program* ;

program = {func}function | {stmt}statement;

function = {func}def identifier l_par argument? r_par semi statement ;

argument = {arg} identifier assign_value? subsequent_arguments* ;

assign_value = {assign} eq value ;

subsequent_arguments = {more_args} comma identifier assign_value? ;

statement = {case1}tab* if comparison semi statement
          | {case2}tab* while comparison semi statement
          | {case3}tab* for [iterator]:identifier in [collection]:identifier semi statement
          | {case4}tab* return expression
          | {case5}tab* print expression more_expressions
          | {simple_equals}tab* identifier eq expression
          | {add_equals}tab* identifier add_eq expression
          | {minus_equals}tab* identifier sub_eq expression
          | {div_equals}tab* identifier div_eq expression
          | {case7}tab* identifier l_br [exp1]:expression r_br eq [exp2]:expression
          | {case8}tab* function_call;

comparison = {less_than} comparison less relation
           | {greater_than} comparison great relation
           | {rel} relation;

relation = {relational_value} relational_value
         | {logic_not_equals} relation logic_neq relational_value
         | {logic_equals} relation logic_equals relational_value;

relational_value = {expression_value} expression_value
      | {true} true
      | {false} false;

expression = {case1} arithmetic_expression
           | {case2} prefix
           | {case4} identifier l_br expression r_br
           | {case9} l_br more_values r_br;

more_expressions = {more_exp} expression subsequent_expressions*;

subsequent_expressions = {more_exp} comma expression;

arithmetic_expression = {plus} arithmetic_expression plus multiplication
         | {minus} arithmetic_expression minus multiplication
         | {multiplication} multiplication ;

multiplication = {expression_value} expression_value
         | {div} multiplication div expression_value
         | {mult} multiplication mult expression_value;

expression_value = {exp} l_par expression r_par
                 | {function_call} function_call
                 | {value} value
                 | {identifier} identifier ;

prefix = {pre_increment} plus_plus prepost_operand
       | {pre_decrement} minus_minus prepost_operand
       | {postfix} postfix;

postfix = {post_increment} prepost_operand plus_plus
        | {post_decrement} prepost_operand minus_minus;  

prepost_operand = {value} identifier l_br expression r_br
                 | {identifier} identifier;

function_call = {args} identifier l_par arglist? r_par;

arglist = {arglist} more_expressions ;

value = {number} number
      | {string} string ;

more_values = {more_values} value subsequent_values* ;

subsequent_values = comma value ;

number = {int} numeral              
       | {float} float_numeral ;

where identifier is of course a token, and the problematic productions where it can be found are function_call, prepost_operand, expression_value. I experimentally removed prefix/postfix and prepost_operand to see if the conflicts would at least change a little, but that just leaves the two last conflicts. Is there any way i can resolve these conflicts without changing the grammar much, or have i gone down a completely wrong path?


Solution

  • The problem is the production whose right-hand side is:

    print expression more_expressions
    

    more_expressions matches a list of expressions (so it probably should be called expression_list to be less confusing). Two consecutive expressions in a rule is obviously ambiguous (if you could have two expressions, would 1+1+1 be 1+1 followed by +1 or 1 followed by +1+1?). What you want is just

    print more_expressions