I am writing a parser using a context-free grammar with Bison and Flex to handle a simple language that includes for loops. However, I'm encountering a syntax error when I try to parse valid for loop statements. Below are my .l and .y files, along with the output and error message I'm getting.
Flex File (for.l):
%option noyywrap
%{
#include "for2.tab.h" // Include the header generated by Bison
%}
alpha [A-Za-z]
digit [0-9]
%%
[ \t\n] ; // Ignore whitespace
for return FOR;
printf return ID;
{digit}+ { yylval = atoi(yytext); return NUM; }
{alpha}({alpha}|{digit})* { return ID; }
\"([^\\\"]|\\.)*\" { return STRING; }
"++" return INC;
"--" return DEC;
"<=" return LE;
">=" return GE;
"==" return EQ;
"!=" return NE;
"||" return OR;
"&&" return AND;
[-+*/()=<>!;{},%] return yytext[0];
. { fprintf(stderr, "Unrecognized character: %s\n", yytext); }
%%
Bison File (for.y):
%{
#include <stdio.h>
#include <stdlib.h>
int yylex(void);
void yyerror(const char *s) {
fprintf(stderr, "Error: %s\n", s);
}
%}
%token ID NUM FOR STRING
%token LE GE EQ NE OR AND
%token INC DEC
%left OR
%left AND
%left EQ NE
%left '<' '>' LE GE
%%
Program : Statement { printf("Input accepted\n"); exit(0); }
;
Statement : ForStatement
| Expression ';'
| FunctionCall ';'
;
ForStatement: FOR '(' AssignmentExpression ';' LogicalExpression ';' IncrementExpression ')' Block
;
Block : '{' StatementList '}'
| Statement
| /* empty */
;
StatementList : StatementList Statement
| Statement
;
Expression : AssignmentExpression
| ArithmeticExpression
;
AssignmentExpression
: ID '=' ArithmeticExpression
;
ArithmeticExpression
: ArithmeticExpression '+' Term
| ArithmeticExpression '-' Term
| Term
;
Term : Term '*' Factor
| Term '/' Factor
| Factor
;
Factor : '(' ArithmeticExpression ')'
| ID
| NUM
| '-' Factor
;
IncrementExpression
: ID INC
| ID DEC
| AssignmentExpression
;
LogicalExpression
: LogicalExpression OR LogicalTerm
| LogicalTerm
;
LogicalTerm : LogicalTerm AND LogicalFactor
| LogicalFactor
;
LogicalFactor
: RelationalExpression
| '!' LogicalFactor
| '(' LogicalExpression ')'
;
RelationalExpression
: ArithmeticExpression '<' ArithmeticExpression
| ArithmeticExpression '>' ArithmeticExpression
| ArithmeticExpression LE ArithmeticExpression
| ArithmeticExpression GE ArithmeticExpression
| ArithmeticExpression EQ ArithmeticExpression
| ArithmeticExpression NE ArithmeticExpression
;
FunctionCall: ID '(' ArgumentList ')'
;
ArgumentList: Expression
| ArgumentList ',' Expression
| STRING
| /* empty */
;
%%
int main() {
printf("Enter the expression:\n");
yyparse();
return 0;
}
Output and Error:
When I try to parse the following input:
for ( i = 0 ; i < 5 ; i++ ) { a = a + i; }
I get the following output:
Enter the expression:
for ( i = 0 ; i < 5 ; i ++ ) { a = a + i; }
(null) = 0
Error: syntax error
On running the following command:
bison -dv for2.y
i get:
for2.y: conflicts: 5 shift/reduce
My Question: I don't understand why the parser is throwing a syntax error. The for loop input looks valid, and I've already defined the necessary tokens and rules to handle for loops and increment expressions like i++.
I've tried various changes, but the error persists. Could someone point out what might be wrong with my grammar or Flex/Bison setup? Any help would be appreciated!
Because you have shift-reduce conflicts, the grammar cannot be recognized by an LALR(1) parser. As a result, bison will resolve the conflicts by choosing to shift (rather than reduce) in each conflicting state, resulting in a parser that recognizes a SUBSET of the language specified by the grammar.
In order to see exactly what is happening, you need to examine the states with conflicts to see how the resolution of the conflicts affect how the parser recognizes things.
With the code you've posted, when you look in for2.output, you'll see those 5 shift/reduce conflicts in State 77:
State 77
5 ForStatement: FOR '(' AssignmentExpression ';' LogicalExpression ';' IncrementExpression ')' . Block
ID shift, and go to state 1
NUM shift, and go to state 2
FOR shift, and go to state 3
'(' shift, and go to state 4
'{' shift, and go to state 78
'-' shift, and go to state 5
ID [reduce using rule 8 (Block)]
NUM [reduce using rule 8 (Block)]
FOR [reduce using rule 8 (Block)]
'(' [reduce using rule 8 (Block)]
'-' [reduce using rule 8 (Block)]
$default reduce using rule 8 (Block)
which is noting the ambiguity that comes from have the rule Block: /* empty */
which does seem like an error in your grammar, but which should not cause the result you report. The resolution (which is mostly equivalent to simply deleting that rule) should work fine for this one case, though you might prefer changing it to Block: ';'
.
When I run your program, it works just fine:
> ./for2
Enter the expression:
for ( i = 0 ; i < 5 ; i ++ ) { a = a + i; }
Input accepted
>
which implies that what you posted is not exactly the same as what you are running.
I can reproduce your result if I replace the Program: Statement
rule with Program: StatementList
(allowing one or more statements as the input rather than exiting after a single statement).
In that case, it is the second line of input ((null) = 0;
) causing the syntax error, as your AssignmentExpression
rule only allows a single ID
as the lvalue, not an lvalue expression. If you change that rule to
AssignmentExpression
: ArithmeticExpression '=' ArithmeticExpression
then it works fine (with the added ;
in the input)