parsinggrammarbnflemon

Fixing ambiguities on a Lemon parser grammar


I'm having what appears to be an ambiguous grammar. It seems like there are some problems under FileText since there is no conflict when I run only the top part (above FileText). Can anyone help me to spot where my issue is? I believe my tree looks fine. Here's an input sample:

lemon AND (#Chapter1.Title : "BNF grammar" AND #Chapter10.Title : ("BNF notion" OR "EBNF notion"))

error:

     QUOT shift        17
     QUOT reduce       14  ** Parsing conflict **
      STR shift-reduce 20  subval ::= STR
      STR reduce       14  ** Parsing conflict **
     LPAR shift         7  
     LPAR reduce       14  ** Parsing conflict **
       WS shift-reduce 10  space ::= WS
       WS reduce       14  ** Parsing conflict **
       op shift         9
    space shift        12
     text shift-reduce 15  filetext::= filetext text
 subvalue shift-reduce 15  filetext::= filetext text /*because subval==text
 {default} reduce      14  location ::= location COLON filetext

grammar:

%left::=AND.
%left::=OR.

book::= expr.

expr::= expr term.
expr::= expr op term.
expr::= term.

term::= value.
term::= QUOT STR QUOT.

value::= atom.
value::= LPAR expr RPAR.

atom::= STR.
atom::= file.

op::= space AND space.
op::= space OR space.

space::= WS.
space::= space WS.

file::= location COLON filetext.

location::= SHARP STR PERIOD STR.

filetext::= filetext text.
filetext::= filetext op text.
filetext::= text.

text::= subvalue.
text::= QUOT STR QUOT.

subvalue::= subatom.
subvalue::= LPAR filetext RPAR.

subatom::= STR.

For what is worth, the tree came up with and derived my grammar from:

enter image description here


Solution

  • The problem is that you allow implicit concatenation (i.e., without an operator) in both expr and filetext.

    Consider the concatenation expr term.

    Note that expr can derive file (through term -> value -> atom) and file ends with filetext, which in turn can be filetext text.

    Both text and term derive STR and QUOT STR QUOT. Suppose your input were something like

    SHARP STR PERIOD STR COLON      STR       STR
    ------location------       --filetext--?
    ----------------------file-------------?
    

    Now, how is that last STR to be handled?

    The parser could reduce location COLON filetext to file and then expr. Then the last STR could be reduced to term (through atom and value), so that the whole input reduces to expr using expr::=expr term.

    But it could also reduce STR to text (via subatom and subvalue), which would let it extend filetext to include the last STR. Then when it reduces location COLON filetext to file and expr, it's done.

    That's definitely an ambiguity (and it's just one of many). There's no obvious way to resolve the conflict -- at least, nothing is obvious to me -- so you'll have to figure out which of the alternatives you actually want.

    By the way, I don't think your whitespace handling corresponds to your expectations. For example, nothing in your grammar permits the whitespace which surrounds your :. You're probably better off just ignoring whitespace in your lexical analyser and dropping it from the grammar, where it just complicates things. If you don't want to allow "word"suffix to be analysed as two lexemes ("word" suffix), then you can put a check in your lexical analyser which requires a close quote to not be followed by a STR (and similar, an open quote to not be preceded by a STR). You would actually probably be better off recognising the quoted strings in your lexical analyser as well; since the difference is lexical rather than syntactic. As written, QUOT STR QUOT won't match "BNF grammar", for example (because BNF grammar doesn't match STR).