parsingcompiler-constructionlemon

Bison/YACC vs. Lemon vs. standard input


I'm trying to convert a calculator from Bison to Lemon.

I ran into an unexpected problem involving standard input where the two programs behave quite differently. The Bison version prints the result immediately after pressing [Enter]. With the Lemon version, the result is delayed until I type a new expression and press [Enter].

I've created tiny Bison and Lemon grammars and Flex scanners to illustrate the problem. This is on Windows 7, using a July 2014 version of Lemon, Bison 2.41, and gcc (tdm64-2) 4.8.1.

A simple session with the Bison version

Bison version session

Notice how the result is returned after pressing [Enter] after the simple expression.

A simple session with the Lemon version

Lemon version session

Notice how the result is only returned after the entering of a second expression and pressing of [Enter] (ctrl Z signals end-of-input for cmd.exe).

What have I done wrong?

Bison/Flex version source

badd.l:

%{
    #include "y.tab.h"
    #include <stdlib.h>
%}

%%
[0-9]+      {yylval = atoi(yytext); return INTEGER;}
[+]         return PLUS;
[\n]        return NL;
[ \t]       ;       /* skip whitespace */
.           {printf("Unknown character '%c'\n", yytext[0]); return 0;}
%%

int yywrap(void) {
    return 1;
}

badd.y:

%{
    #include <stdio.h>
    int yylex(void);
    void yyerror(char *);
%}

%token INTEGER PLUS NL
%left PLUS MINUS

%%

prog:   prog expr NL                { printf("%d\n", $2); }
        |
        ;
expr:   INTEGER                     { $$ = $1; }
        | expr PLUS expr            { $$ = $1 + $3; }
        ;
%%

void yyerror(char *s) {
    fprintf(stderr, "%s\n", s);
}

int main(void) {
    yyparse();
    return 0;
}

To build:

bison -y -d badd.y
flex badd.l
gcc y.tab.c lex.yy.c -o badd.exe

Lemon/Flex version source

ladd.l

%{
    #include "ladd.h"
    #include <stdlib.h>
    extern int yylval;
%}

%%
[0-9]+      {yylval = atoi(yytext); return INTEGER;}
[+]         return PLUS;
[\n]        return NL;
[ \t]       ;       /* skip whitespace */
.           {printf("Unknown character '%c'\n", yytext[0]); return 0;}
%%

int yywrap(void) {
    return 1;
}

ladd.y:

%include { #include <assert.h> }
%syntax_error { printf("Lemon syntax error\n"); }
%token_type {int}
%left PLUS MINUS .

start   ::= prog .

prog    ::= prog expr(a) NL .           { printf("%d\n", a); }
prog    ::= .

expr(a) ::= INTEGER(b) .                { a = b; }
expr(a) ::= expr(b) PLUS expr(c) .      { a = b + c; }

main.c:

#include <stdio.h>
#include <stdlib.h>

void *ParseAlloc(void *(*mallocProc)(size_t));
void ParseFree(void *p, void (*freeProc)(void*));
void Parse(void *yyp, int yymajor, int foo);

int yylex(void);
int yylval;

int main(void) {
   void *pParser;
   int tok;

   pParser = ParseAlloc(malloc);

   while ((tok = yylex()) != 0) {
      Parse(pParser, tok, yylval);
   }
   Parse(pParser, 0, 0);
   ParseFree(pParser, free );

   return 0;
}

To build:

lemon ladd.y
flex ladd.l
gcc main.c ladd.c lex.yy.c -o ladd.exe

Solution

  • A Bison LALR(1) parser will reduce immediately if there is only one possible reduce action and no possible shift actions. (That doesn't mean that all lookahead tokens have the same reduce action. Some may be errors, but the reduction will happen anyway.)

    Lemon does not implement this optimisation. It always requires a lookahead token. (However, it also does table compression IIRC, so it also may do a reduction even though the lookahead token indicates that the input is not well-formed. That's a feature of LALR(1) parsing.)

    The key to solving the problem is to ensure that the reduction which prints the expression value is performed with the newline as a lookahead token. In Yacc or Bison, you could do this with a mid-rule action, but Lemon doesn't implement these, so you need to add a unit rule in order to trigger the action, something like this:

    start   ::= prog .
    
    prog    ::= prog print NL .
    prog    ::= .
    
    print   ::= expr(a) .         { printf("%d\n", a); }
    

    Here, the reduction from expr to print is solely for the purpose of printing the value of the expression.

    This solution will work just fine with Yacc or Bison as well, by the way. It is arguably better than relying on Bison's lookahead optimization, which is not, as far as I know, guaranteed to work in all circumstances.