I'm having a problem parsing a simple array of elements with Lemon. Can someone enlighten me??
I'm trying to parse this string "[0 0 612 792][100 200]" with mygrammar definition and the parser always skips the first array element and duplicate the last... any idea??
The grammar file is
%token_type { char* }
%include {
#include <assert.h>
}
%parse_accept { printf("The parser has completed successfully.\n"); }
%syntax_error { fprintf(stderr, "Syntax Error\n"); }
%parse_failure { fprintf(stderr, "Parse failure\n"); }
%start_symbol program
array_value ::= INT_LITERAL(A). {printf("Av: %s\n", A);}
array_value_list ::=.
array_value_list ::= array_value_list array_value.
array_declaration ::= LBRACKET array_value_list RBRACKET.
array_list ::= array_declaration.
array_list ::= array_list array_declaration.
program ::= array_list END_TOKEN.
I use re2c to get the tokens and the code that call the parser for each token is
while(token = scan(&scanner, buff_end)) {
// Send strings to the parser with NAME tokens
if(token == INT_LITERAL) {
name_length = scanner.cur - scanner.top;
strncpy(name_str, scanner.top, name_length);
name_str[name_length] = '\0';
//printf("Token:Pre: %s\tName: %s\n", tokenStr(token),name_str);
Parse(parser, token, name_str);
}
else {
//printf("Token: %s\n", tokenStr(token));
Parse(parser, token, 0);
}
// Execute Parse for the last time
if(token == END_TOKEN) {
Parse(parser, 0, NULL);
break;
}
}
For the input string "[ 0 -100 612 792][100 200]" the output is:
Av: -100
Av: 612
Av: 792
Av: 792
Av: 200
Av: 200
As you can notice, the first element doesn't appears, and the last one is duplicated.
The grammar out from lemon is:
State 0:
array_declaration ::= * LBRACKET array_value_list RBRACKET
array_list ::= * array_declaration
array_list ::= * array_list array_declaration
program ::= * array_list END_TOKEN
LBRACKET shift 3
array_declaration shift 1 /* because array_declaration==array_list */
array_list shift 1
program accept
State 1:
array_declaration ::= * LBRACKET array_value_list RBRACKET
array_list ::= array_list * array_declaration
program ::= array_list * END_TOKEN
LBRACKET shift 3
END_TOKEN shift 4
array_declaration shift-reduce 5 array_list ::= array_list array_declaration
State 2:
array_value ::= * INT_LITERAL
array_value_list ::= array_value_list * array_value
array_declaration ::= LBRACKET array_value_list * RBRACKET
INT_LITERAL shift-reduce 0 array_value ::= INT_LITERAL
RBRACKET shift-reduce 3 array_declaration ::= LBRACKET array_value_list RBRACKET
array_value shift-reduce 2 array_value_list ::= array_value_list array_value
State 3:
(1) array_value_list ::= *
array_value_list ::= * array_value_list array_value
array_declaration ::= LBRACKET * array_value_list RBRACKET
array_value_list shift 2
{default} reduce 1 array_value_list ::=
State 4:
(6) program ::= array_list END_TOKEN *
$ reduce 6 program ::= array_list END_TOKEN
----------------------------------------------------
Symbols:
0: $:
1: INT_LITERAL
2: LBRACKET
3: RBRACKET
4: END_TOKEN
5: error:
6: array_value: INT_LITERAL
7: array_value_list: <lambda> INT_LITERAL
8: array_declaration: LBRACKET
9: array_list: LBRACKET
10: program: LBRACKET
And the output trace for the sample string is:
T__Input 'LBRACKET'
T__Shift 'LBRACKET', go to state 3
T__Return. Stack=[LBRACKET]
T__Input 'INT_LITERAL'
T__Reduce [array_value_list ::=], go to state 3.
T__Shift 'array_value_list', go to state 2
T__Shift 'INT_LITERAL'
T__Return. Stack=[LBRACKET array_value_list INT_LITERAL]
T__Input 'INT_LITERAL'
T__Reduce [array_value ::= INT_LITERAL], go to state 2.
T__Shift 'array_value'
T__Reduce [array_value_list ::= array_value_list array_value], go to state 3.
T__Shift 'array_value_list', go to state 2
T__Shift 'INT_LITERAL'
T__Return. Stack=[LBRACKET array_value_list INT_LITERAL]
T__Input 'INT_LITERAL'
T__Reduce [array_value ::= INT_LITERAL], go to state 2.
T__Shift 'array_value'
T__Reduce [array_value_list ::= array_value_list array_value], go to state 3.
T__Shift 'array_value_list', go to state 2
T__Shift 'INT_LITERAL'
T__Return. Stack=[LBRACKET array_value_list INT_LITERAL]
T__Input 'INT_LITERAL'
T__Reduce [array_value ::= INT_LITERAL], go to state 2.
T__Shift 'array_value'
T__Reduce [array_value_list ::= array_value_list array_value], go to state 3.
T__Shift 'array_value_list', go to state 2
T__Shift 'INT_LITERAL'
T__Return. Stack=[LBRACKET array_value_list INT_LITERAL]
T__Input 'RBRACKET'
T__Reduce [array_value ::= INT_LITERAL], go to state 2.
T__Shift 'array_value'
T__Reduce [array_value_list ::= array_value_list array_value], go to state 3.
T__Shift 'array_value_list', go to state 2
T__Shift 'RBRACKET'
T__Return. Stack=[LBRACKET array_value_list RBRACKET]
T__Input 'LBRACKET'
T__Reduce [array_declaration ::= LBRACKET array_value_list RBRACKET], go to state 0.
T__Shift 'array_declaration', go to state 1
T__Shift 'LBRACKET', go to state 3
T__Return. Stack=[array_declaration LBRACKET]
T__Input 'INT_LITERAL'
T__Reduce [array_value_list ::=], go to state 3.
T__Shift 'array_value_list', go to state 2
T__Shift 'INT_LITERAL'
T__Return. Stack=[array_declaration LBRACKET array_value_list INT_LITERAL]
T__Input 'INT_LITERAL'
T__Reduce [array_value ::= INT_LITERAL], go to state 2.
T__Shift 'array_value'
T__Reduce [array_value_list ::= array_value_list array_value], go to state 3.
T__Shift 'array_value_list', go to state 2
T__Shift 'INT_LITERAL'
T__Return. Stack=[array_declaration LBRACKET array_value_list INT_LITERAL]
T__Input 'RBRACKET'
T__Reduce [array_value ::= INT_LITERAL], go to state 2.
T__Shift 'array_value'
T__Reduce [array_value_list ::= array_value_list array_value], go to state 3.
T__Shift 'array_value_list', go to state 2
T__Shift 'RBRACKET'
T__Return. Stack=[array_declaration LBRACKET array_value_list RBRACKET]
T__Input 'END_TOKEN'
T__Reduce [array_declaration ::= LBRACKET array_value_list RBRACKET], go to state 1.
T__Shift 'array_declaration'
T__Reduce [array_list ::= array_list array_declaration], go to state 0.
T__Shift 'array_list', go to state 1
T__Shift 'END_TOKEN', go to state 4
T__Return. Stack=[array_list END_TOKEN]
T__Input '$'
T__Reduce [program ::= array_list END_TOKEN], go to state 0.
T__Accept!
T__Return. Stack=]
I'm stuck on this error, and I'm pretty sure that is a concept error that I don't understand. Any help is appreciated.
Thanks
You don't show the definition of name_str
in your scanner code, but it seems likely that it is an array of char
. If that's the case, you are risking a buffer overrun since you never check to make sure that name_length
is less than the buffer size; furthermore, you might as well use memcpy
instead of strncpy
because you already know there is no NUL character in the string your copying. But neither of those things are actually the problem: the problem is that you are copying every token into the same buffer.
What you are passing to the parser is the address of a NUL-terminated string. The parser does not copy the string; it just stores the address as the semantic value of the token. In other words, the parser assumes that it owns the passed in token string, at least until the parse is complete.
But in fact the character buffer (name_str
) is owned by the scanner, and once it has pushed the token into the parser, it assumes that it is free to do what it will with the character buffer. Which is to overwrite the buffer with the next token.
Unlike bison, lemon does not reduce immediately when the lookahead wouldn't matter. bison would have reduced INT_LITERAL
to array_value
as soon as it saw the literal, because the reduction doesn't depend on the lookahead. But lemon always has a lookahead token, so it doesn't reduce INT_LITERAL
to array_value
until it has received the next token. Unfortunately, if the next token is also an INT_LITERAL
, the next token will have overwritten the character buffer before the reduction happens, so the reduction's action will print out the string in the next token. If the next token is a ], then the character buffer will not be overwritten by the scanner, so in that case the current token will be printed, although it was already printed by the previous token's reduction.
In general, the scanner has no way of knowing for how long the parser will require the token values. There are only two sensible ownership policies for that problem environment:
The first policy is cleaner, since it doesn't require the two components to agree on a resource management protocol. But the parser generator is not likely to include any mechanism which would allow the implementation of that policy, since it would require some action to be taken at the top of the parser function.
So you need to use the second policy; the scanner must make a copy of the token string in newly-allocated memory and pass this memory, along with its ownership, to the parser so that the parser must (eventually) free the copy. That's the same protocol which is used by most functioning bison parsers, for the same reason, and the various errors which ensue when the copy is not made are probably the most common obscure bison errors.
Of course, in this simple case, you could avoid the memory management issues by having the scanner convert the string to an integer.