parsinglalr

LALR(1) Parser Not Parsing The Text At All


I have to admit I'm an absolute newbie in this and might not even understand what I am doing.

I am trying to make a grammar that at least contains grammar from BABA IS YOU, and if possible expands on it. I am using this tool to debug my grammar: http://jsmachines.sourceforge.net/machines/lalr1.html

Admittedly, my grammar is currently not LALR(1) (as seen by many shift/reduce conflicts which I am unsure on how to properly resolve).

So, when I enter "RED AND BLUE BABA IS YOU" into the parser, this is what I expect to see:

"RED AND BLUE BABA IS YOU" Expected Tree

And yet what I see is:

Unexpected outcome

I have no idea at where to dig to start understanding my problem and I need help with at least that

The Grammar I use is this: https://pastebin.com/5MHZrFLe

sentence' -> sentence
 
sentence -> give
 
give -> giver property
giver -> noun IS
 
selector -> adjective noun
 
multinoun -> noun AND
multinoun -> multinoun AND
multinoun -> multinoun noun
 
multiadjective -> adjective AND
multiadjective -> multiadjective AND
multiadjective -> multiadjective adjective
 
noun -> multinoun
noun -> selector
 
noun -> BABA
noun -> KEKE
noun -> ROBOT
 
adjective -> RED
adjective -> BLUE
adjective -> GREEN
 
property -> YOU

Solution

  • In order for the token AND in that sentence to be recognised, there would have to be a derivation sequence leading from sentence' to multiadjective. There is no such sequence, as can easily be verified by doing a simple reachability graph (which is just a DFS).

    That makes multiadjective useless in that grammar. It's slightly surprising that the tool you use doesn't warn you about that.

    That's not the case for multinoun, which is reachable through the noun -> multinoun production. However, that creates a number of ambiguities, leading to shift/reduce conflicts. One example:

    noun -> multinoun -> multinoun AND
    

    vs

    noun -> multinoun -> noun AND -> multinoun AND
    

    The general pattern for a bottom-up grammar representing a list of token-separated items is:

    list -> item
    list -> list separator item
    

    In such a grammar, the list is included in an outer production using the non-terminal list, not item. Adding item -> list in order to be able to refer to it as item leads to the same ambiguities as your noun non-terminal, which more or less reproduces this error.