I'm working on a parser, using GNU bison, and I'm facing an interesting problem. My actual problem is a bit different and less interesting in general, so I will state it a bit differently, so that an answer would be more useful in general.
I need to distinguish expressions depending on their type, like arithmetic expressions from string expressions. Their top-level nonterminals have some common ancestors, like
statement
: TOK_PRINT expression TOK_SEMICOLON {...}
;
expression
: arithmeticexpression {...}
| stringexpression {...}
;
Now I need to be able to have variables in both kinds of expressions
arithmeticexpression
: TOK_IDENTIFIER {...}
;
stringexpression
: TOK_IDENTIFIER {...}
;
(only the variables of string-type should be allowed in the stringexpression case and only variables of int- or float-type should be allowed in the arithmeticexpression case) But this obviously causes an R/R conflict, which is inherent in the language - it is impossible to resolve, because the language is ambiguous.
Of course I could water down the language, so that only general "Expression" objects are passed around on the parse-stack, but then I'd have to do a lot of manual type-checking in the actions, which I would like to avoid. Also, in my real use-case, the paths through the grammar to the variable-rules are so different, that I would have to water the language down so much that I would lose a lot of grammar-rules (i.e. lose a lot of structural information) and would need to hand-write parsing-engines into some actions.
I have read about GLR-parsing, which sounds like it could solve my problem. I'm considering to use this feature: have the grammar as above and YYERROR
in the paths where the corresponding variable has the wrong type.
arithmeticexpression
: TOK_IDENTIFIER {
if(!dynamic_cast<IntVariable*>(
symbol_table[*$<stringvalue>1]))
YYERROR;
}
;
stringexpression
: TOK_IDENTIFIER {
if(!dynamic_cast<StringVariable*>(
symbol_table[*$<stringvalue>1]))
YYERROR;
}
;
but the Bison Manual says
- During deterministic GLR operation, the effect of YYERROR is the same as its effect in a deterministic parser.
- The effect in a deferred action is similar, but the precise point of the error is undefined; instead, the parser reverts to deterministic operation, selecting an unspecified stack on which to continue with a syntax error.
- In a semantic predicate (see Semantic Predicates) during nondeterministic parsing, YYERROR silently prunes the parse that invoked the test.
I'm not sure I understand this correctly - I understand it this way:
Does anyone have experience with such a situation? Is my understanding of the manual wrong? How would you approach this problem? To me, the problem seems to be so natural that I would assume people run into it often enough that bison would have a solution built-in...
The example here works, but when I put it together with the solution for Bison: GLR-parsing of valid expression fails without error message I ran into the following problem:
The variable could be determined by more than one identifier. You could have an array-index or it could be a member of a different object. I modeled this situation by having another non-terminal like
lvalue
: TOK_IDENTIFIER
| lvalue '[' arithmeticexpression ']'
| lvalue '.' TOK_IDENTIFIER
But when I now have
arithmeticexpression : lvalue
stringexpression : lvalue
and (try to) access the object from the non-terminal "lvalue" like above, I get a segmentation fault. So it seems that the method here doesn't work for more complex situations.
What I did instead now is that I access the object in the semantic ACTION and set $$
to nullptr
if the variable has the wrong type.
arithmeticexpression
: lvalue
{
$$ = nullptr;
if(dynamic_cast<IntVariable*>($lvalue->getVariable()))
$$ = new ArithmeticExpressionIntVariable($lvalue);
}
Then I had to propagate the nullptr
(this is quite a lot of additional code) like this
arithmeticexpression
...
| arithmeticexpression[left] '+' arithmeticexpressionM[right]
{
$$ = nullptr;
if($left && $right)
$$ = new ArithmeticExpressionPlus($left, $right);
}
So now, at the
expression
: arithmeticexpression
| stringexpression
(note: I have "Expression* expr;" in the "%union{" declaration
and "%type <expression> expr" in the prologue)
I have an ambiguity: the same input text can be parsed in two different ways, but only one of them will have a value != nullptr
. At this point, I need a custom merge procedure which basically just selects the non-null value.
For this, I forward-declared it in the prologue of my bison file like this
static Expression* exprMerge (yy::semantic_type x0, yy::semantic_type x1);
and defined it in the epilogue like this
static Expression* exprMerge (YYSTYPE x0, YYSTYPE x1) \
{
/* otherwise we'd have a legitimate ambiguity */
assert(!x0.expr || !x1.expr);
return x0.expr ? x0.expr : x1.expr;
}
finally, I had to tell bison to use this procedure to resolve the ambiguity, using
expression
: arithmeticexpression %merge <exprMerge>
| stringexpression %merge <exprMerge>
Honestly, I think this is quite a lot of effort, which wouldn't be necessary if bison tested semantic predicates (at least the ones which are behind the rule) when it tries to merge the paths, not to rule out paths before they merge.
But at least this works and is far less effort than collecting the tokens and sorting them manually, which would have been the alternative.