I'm having what appears to be an ambiguous grammar. It seems like there are some problems under FileText since there is no conflict when I run only the top part (above FileText). Can anyone help me to spot where my issue is? I believe my tree looks fine. Here's an input sample:
lemon AND (#Chapter1.Title : "BNF grammar" AND #Chapter10.Title : ("BNF notion" OR "EBNF notion"))
error:
QUOT shift 17
QUOT reduce 14 ** Parsing conflict **
STR shift-reduce 20 subval ::= STR
STR reduce 14 ** Parsing conflict **
LPAR shift 7
LPAR reduce 14 ** Parsing conflict **
WS shift-reduce 10 space ::= WS
WS reduce 14 ** Parsing conflict **
op shift 9
space shift 12
text shift-reduce 15 filetext::= filetext text
subvalue shift-reduce 15 filetext::= filetext text /*because subval==text
{default} reduce 14 location ::= location COLON filetext
grammar:
%left::=AND.
%left::=OR.
book::= expr.
expr::= expr term.
expr::= expr op term.
expr::= term.
term::= value.
term::= QUOT STR QUOT.
value::= atom.
value::= LPAR expr RPAR.
atom::= STR.
atom::= file.
op::= space AND space.
op::= space OR space.
space::= WS.
space::= space WS.
file::= location COLON filetext.
location::= SHARP STR PERIOD STR.
filetext::= filetext text.
filetext::= filetext op text.
filetext::= text.
text::= subvalue.
text::= QUOT STR QUOT.
subvalue::= subatom.
subvalue::= LPAR filetext RPAR.
subatom::= STR.
For what is worth, the tree came up with and derived my grammar from:
The problem is that you allow implicit concatenation (i.e., without an operator) in both expr
and filetext
.
Consider the concatenation expr term
.
Note that expr
can derive file
(through term -> value -> atom
) and file
ends with filetext
, which in turn can be filetext text
.
Both text
and term
derive STR
and QUOT STR QUOT
. Suppose your input were something like
SHARP STR PERIOD STR COLON STR STR
------location------ --filetext--?
----------------------file-------------?
Now, how is that last STR
to be handled?
The parser could reduce location COLON filetext
to file
and then expr
. Then the last STR
could be reduced to term
(through atom
and value
), so that the whole input reduces to expr
using expr::=expr term
.
But it could also reduce STR
to text
(via subatom
and subvalue
), which would let it extend filetext
to include the last STR
. Then when it reduces location COLON filetext
to file
and expr
, it's done.
That's definitely an ambiguity (and it's just one of many). There's no obvious way to resolve the conflict -- at least, nothing is obvious to me -- so you'll have to figure out which of the alternatives you actually want.
By the way, I don't think your whitespace handling corresponds to your expectations. For example, nothing in your grammar permits the whitespace which surrounds your :
. You're probably better off just ignoring whitespace in your lexical analyser and dropping it from the grammar, where it just complicates things. If you don't want to allow "word"suffix
to be analysed as two lexemes ("word" suffix
), then you can put a check in your lexical analyser which requires a close quote to not be followed by a STR
(and similar, an open quote to not be preceded by a STR
). You would actually probably be better off recognising the quoted strings in your lexical analyser as well; since the difference is lexical rather than syntactic. As written, QUOT STR QUOT
won't match "BNF grammar"
, for example (because BNF grammar
doesn't match STR
).