cparsinggeneratorlemon

Lemon takes lempar.c and outputs garbage at end of file


I'm using the LEMON Parser Generator and for some reason it's outputting a bunch of garbage at the end of the outputted file, rather than replacing the %%s from lempar.c with the generated code. I've copied lemon.c and lempar.c directly from the sqlite source. Here's my grammar file:

%token_prefix TK_
%token_type {const char*}
%extra_argument { HiqupElem elem }

%syntax_error {
    printf("Hit syntax error, not sure..\n");
}

%stack_overflow {
    printf("Stackoverflow.com\n");
}

%name hiqupParser

%include {
    #include <stdio.h>
    #include <assert.h>
    #include "types.h"
}


%start_symbol start

start ::= in .
in(A) ::= in expression(B) SEMICOLON . { printf("Found expression %s, %s!\n", A, B); }

expression(A) ::= STRING(B) . { A = B }
expression(A) ::= NUMBER(B) . { A = B }

and here's the garbage it's just appending to the end of a copy of the lempar file

  "$",             "SEMICOLON",     "STRING",        "NUMBER",      
  "error",         "start",         "in",            "expression",  
 /*   0 */ "in ::= in expression SEMICOLON",
 /*   1 */ "expression ::= STRING",
 /*   2 */ "expression ::= NUMBER",
 /*   3 */ "start ::= in",
#line 9 "compiler.y"

    printf("Stackoverflow.com\n");
#line 1025 "compiler.c"
  { 6, 3 },
  { 7, 1 },
  { 7, 1 },
  { 5, 1 },
        YYMINORTYPE yylhsminor;
      case 0: /* in ::= in expression SEMICOLON */
#line 25 "compiler.y"
{ printf("Found expression %s, %s!\n", yymsp[-2].minor.yy0, yymsp[-1].minor.yy0); }
#line 1034 "compiler.c"
        break;
      case 1: /* expression ::= STRING */
      case 2: /* expression ::= NUMBER */ yytestcase(yyruleno==2);
#line 27 "compiler.y"
{ yylhsminor.yy0 = yymsp[0].minor.yy0 }
#line 1040 "compiler.c"
  yymsp[0].minor.yy0 = yylhsminor.yy0;
        break;
      default:
      /* (3) start ::= in */ yytestcase(yyruleno==3);
        break;
#line 5 "compiler.y"

    printf("Hit syntax error, not sure..\n");
#line 1049 "compiler.c"

Solution

  • Lemon expects the template file lempar.c to have exactly 15 sections separated with lines start %%. (The number 15 is probably subject to change.) In between these sections, it intersperses code generated from the grammar description.

    The function which reads the template does not do a lot of error checking. It simply reads until it hits EOF or finds a line starting with two percent signs:

    while( fgets(line,LINESIZE,in) && (line[0]!='%' || line[1]!='%') ){
      // ...
    }
    

    So if there are fewer than 15 sections, it will just take the missing ones to be empty.

    It turns out that your IDE reindented the downloaded files, including the many of the %% separator lines which happen to fall inside bracketed blocks. So most of the generated text is getting inserted at the wrong place, and many of the %% lines are retained, where they will trigger syntax errors.

    For what its worth, I don't see any practical value in using an IDE to download source files. On the lemon starting page there are links to lemon.c and lempar.c; each of those pages has a Download link (in the light blue bar near the top). From most browsers you can download the file simply by right-clicking on the link and choosing "Save as...". Or you could copy the link address and download it with curl (which is what I did) or wget. (I didn't put a link to the downloadable file here because the link is versioned and you will probably want to use the latest version.)

    Then you only need to compile lemon.c (c99 -Wall -O2 -o lemon lemon.c) and put a copy of lempar.c in the directory from which you run lemon. (Or you can specify the location of lempar.c using the -T option.)