grammarbisoncontext-free-grammarcontext-sensitive-grammar

bison grammar rules for a custom pascal like language


I'm trying to make a compiler for a custom pascal like language using bison and flex and I end up getting syntax errors for programs that should be correct according to my custom grammar.

My custom grammar:

<program>       ::=     program id 
<block>
<block>     ::=     { 
<sequence>
}
<sequence>      ::=     <statement> ( ; <statement> )*
<brackets-seq>  ::=     { <sequence> }
<brack-or-stat> ::=     <brackets-seq> | 
<statement>
<statement>         ::=     ε |
                <assignment-stat> |
                <if-stat> |
                <while-stat> 
<assignment-stat>   ::=     id := <expression>
<if-stat>       ::=     if (<condition>) 
<brack-or-stat>
<elsepart>
<elsepart>      ::=     ε | 
else <brack-or-stat>
<while-stat>        ::=     while (<condition>) 
<brack-or-stat>
<expression>        ::=     <optional-sign> <term> ( <add-oper> <term>)* 
<term>      ::=     <factor>  (<mul-oper> <factor>)*
<factor>        ::=     constant | 
(<expression>) |  
id
<condition>     ::=     <boolterm> (and <boolterm>)*
<boolterm>      ::=     <boolfactor> (or <boolfactor>)*
<boolfactor>        ::= not [<condition>] | 
[<condition>] | 
                <expression> <relational-oper> <expression> 
<relational-oper>   ::=     == | < | > | <> | <= | >=
<add-oper>      ::=     + | -
<mul-oper>      ::=     * | /
<optional-sign>     ::=     ε | <add-oper> 

My grammar implementation on bison:

%{
    #include <stdio.h>
    #include <string.h>
    int yylex(void);
    void yyerror(char *s);
%}

%union {
    int i;
    char *s;
};

%token <i> INTEGERNUM

%token PROGRAM;
%token OR;
%token AND;
%token NOT;
%token IF;
%token ELSE;
%token WHILE;
%token PLUS;
%token MINUS;
%token MUL;
%token DIV;
%token LSB;
%token RSB;
%token LCB;
%token RCB;
%token LEFTPAR;
%token RIGHTPAR;
%token ID;
%token INT;
%token ASSIGN;
%token ISEQUAL;
%token LTHAN;
%token GTHAN;
%token NOTEQUAL;
%token LESSEQUAL;
%token GREATEREQUAL;

%left '+' '-'
%left '*' '/'

%%

program:
        PROGRAM ID block
        ;

block:
        LCB RCB
        |LCB sequence RCB
        ;

sequence:
        statement ';'sequence
        |statement ';'
        ;

bracketsSeq:
        LCB sequence RCB
        ;

brackOrStat:        
        bracketsSeq
        |statement
        ;

statement:
        assignmentStat
        |ifStat
        |whileStat
        |
        ;

assignmentStat:
        ID ':=' expression

ifStat:
        IF LEFTPAR condition RIGHTPAR brackOrStat elsepart
        ;

elsepart:
        ELSE brackOrStat
        |
        ;

whileStat:
        WHILE LEFTPAR condition RIGHTPAR brackOrStat
        ;

expression:
        addOper expression
        |expression addOper expression
        |term
        ;

term:
        term mulOper term
        |factor
        ;

factor:
        INT
        |LEFTPAR expression RIGHTPAR
        |ID
        ;

condition:
        condition AND condition
        |boolterm
        ;

boolterm:
        boolterm OR boolterm
        |boolfactor
        ;

boolfactor:
        NOT LSB condition RSB
        |LSB condition RSB
        |expression relationalOper expression
        ;

relationalOper:
        ISEQUAL
        |LTHAN
        |GTHAN
        |NOTEQUAL
        |LESSEQUAL
        |GREATEREQUAL
        ;

addOper:
        PLUS
        |MINUS
        ;

mulOper:
        MUL
        |DIV
        ;

optionalSign

        |addOper
        ;


%%

int main( int argc, char **argv )
{
             extern FILE *yyin;
             ++argv, --argc;  /* skip over program name */
             if ( argc > 0 )
                     yyin = fopen( argv[0], "r" );
             else
                     yyin = stdin;

             do
                yyparse();
            while(!feof(yyin));
}       

My flex implementation is pretty straightforward where I just return tokens for each symbol or identifier needed.

Using my implementation on the following simple program:

program circuit
{
    a:=b;
}

I end up getting a syntax error. Specifically when the parsing reaches the point right after := according to my debugging prints I use:

$ ./a.exe verilog.txt
text = program
text = circuit val = circuit
text = {
text = a val = a
text = :=
syntax error

This is the first time I use flex and bison so I'm guessing that I made a wrong implementation of my original grammar to bison since after the ./bison.exe -dy comp.y command I get:

bison conflicts 64 shift/reduce

Any ideas would be helpful. Thanks!


Solution

  • This rule :

    assignmentStat: ID ':=' expression
    

    uses a token ':=' which bison gives a code distinct from any other token, and which your lexer has no way of knowing, so you're almost certainly not returning it. You're probably returning ASSIGN for the character sequence ':=', so you want:

    assignmentStat: ID ASSIGN expression
    

    For the shift-reduce conflicts, they mean that the parser doesn't match exactly the language you specified, but rather some subset (as determined by the default shift instead of reduce). You can use bison's -v option to get a complete printout of the parser state machine (including all the conflicts) in a .output file. You can then examine the conflicts and determine how you should change the grammar to match what you want.

    When I run bison on your example, I see only 9 shift/reduce conflicts, all arising from expr: expr OP expr-style rules, which are ambiguous (may be either right- or left- recursive). The default resolution (shift) makes them all right-recursive, which may not be what you want. You can either change the grammar to not be ambiguous, or use bison's built-in precedence resolution tools to resolve them.