parsingcompiler-constructiongrammarjflexcup

Ambiguous Context-free grammar? / Shift/Reduce conflict in CUP


I have the following context-free grammar for a simplified version of C++. When I run it with JFLEX and CUP I get a list of errors like that:

Warning : *** Reduce/Reduce conflict found in state #173
  between especificador ::= (*) 
  and     programa ::= (*) 
  under symbols: {VOID, CHAR, FLOAT, DOUBLE, SIGNED, UNSIGNED, INT, SHORT, LONG}
  Resolved in favor of the second production.

I think the problem is with instrucoesIf but I can't figure it out

start with programa;

programa ::= especificador tipo ID programa2 | DEFINE ID num CRLF programa | ; verificar depois o ERRO
especificador ::= AUTO | STATIC | EXTERN | CONST | ;
tipo ::= VOID | CHAR | FLOAT | DOUBLE | SIGNED inteiro | UNSIGNED inteiro | inteiro;

inteiro ::= SHORT | INT | LONG;
programa2 ::= SEMICOLON programa | LBRACK num RBRACK SEMICOLON programa | LPAREN listaParametros RPAREN bloco programa | COMMA listaID programa;

listaID ::= ID declaracaoParam2 listaIDTail;
listaIDTail ::= SEMICOLON | COMMA listaID;
listaParametros ::= listaParamRestante | ;
listaParamRestante ::= declaracaoParam declParamRestante;
declaracaoParam ::= tipo ID declaracaoParam2;
declaracaoParam2 ::= LBRACK num RBRACK | ;
declParamRestante ::= COMMA listaParamRestante | ;
bloco ::= LBRACE conjuntoInst RBRACE | SEMICOLON conjuntoInst ;
conjuntoInst ::= programa conjuntoInst | instrucoes conjuntoInst | ;
instrucoes ::= ID expressao SEMICOLON | RETURN expr SEMICOLON | PRINTF LPAREN expr RPAREN SEMICOLON | SCANF LPAREN ID RPAREN SEMICOLON | BREAK SEMICOLON | IF LPAREN expr RPAREN instrucoes instrucoesIf;

instrucoesIf ::= ELSE instrucoes | ;
expressao ::= atribuicao | LBRACK expr RBRACK atribuicao | LPAREN exprList RPAREN | ;
atribuicao ::= operadorAtrib expr;
operadorAtrib ::= EQ | MULTEQ | DIVEQ | MODEQ | PLUSEQ | MINUSEQ;
expr ::= exprAnd exprOr;
exprList ::= expr exprListTail | ;
exprListTail ::= COMMA exprList | ;
exprOr ::= OR exprAnd exprOr | ;
exprAnd ::= exprEqual exprAnd2;
exprAnd2 ::= AND exprEqual exprAnd2 | ;
exprEqual ::= exprRelational exprEqual2;
exprEqual2 ::= EQEQ exprRelational exprEqual2 | NOTEQ exprRelational exprEqual2 | ;
exprRelational ::= exprPlus exprRelational2;
exprRelational2 ::= LT exprPlus exprRelational2 | LTEQ exprPlus exprRelational2 | GT exprPlus exprRelational2 | GTEQ exprPlus exprRelational2 | ;

exprPlus ::= exprMult exprPlus2;
exprPlus2 ::= PLUS exprMult exprPlus2 | MINUS exprMult exprPlus2 | ;
exprMult ::= exprUnary exprMult2;
exprMult2 ::= MULT exprUnary exprMult2 | DIV exprUnary exprMult2 | ;
exprUnary ::= PLUS exprParenthesis | MINUS exprParenthesis | exprParenthesis;
exprParenthesis ::= LPAREN expr RPAREN | primary;
primary ::= ID primaryID | num | literal;
primaryID ::= LBRACK primary RBRACK | LPAREN exprList RPAREN | ;

literal ::= STRING | CHAR;
num ::= NUM_INT | NUM_FLOAT;

Solution

  • CUP is a LALR parser generator, which means that there is no need to avoid left-recursion and so you can use a more natural grammatical style; there is no need to split lists into "start" and "continuation" productions, which are hard to read and error-prone, and furthermore often do not allow the direct construction of an accurate syntax tree.

    There are lots of conflicts in your grammar, but this one is clearly ambiguous:

    conjuntoInst ::= programa conjuntoInst | instrucoes conjuntoInst | ;
    

    Note that programa also has an empty alternative. So there could be any number of empty programas at the beginning (or, indeed, in the middle) of a conjuntoInst. You need to be very careful with non-terminals which can derive nothing; you must avoid ever using them in an undelimited list, because the parser cannot tell how many consecutive empty non-terminals are present in the source text.

    Note that completely empty statements are not allowed in C++, nor in C. A completely empty statement would produce exactly this kind of ambiguity. An empty statement must still be terminated with a ;. This makes it possible to have lists of statements.

    So the usual model for C statements (including blocks) (with lots of details left out) is:

    block ::= '{' statementList '}' ;
    statementList ::= | statementList statement ;
    statement ::= emptyStatement | expressionStatement | ifStatement | ...
                | declaration
    emptyStatement ::= ';'
    expressionStatement ::= expression ';'
       ...
    ifStatement ::= IF '(' expression ')' statement
                  | IF '(' expression ')' statement ELSE statement
    declaration ::= modifiers type ID ...