[SOLVED] SLR(1) and LALR(1) grammar conflicts

SLR(1) and LALR(1) grammar conflicts

Suppose you have a grammar G and we find an LR(1) automaton for it. We can transform it into a LALR(1) or SLR(1) parser by doing state-merging and transforming rules but conflicts may appear.

My question is the following: must all problems appear in merged states? Is it possible for a non-conflict LR(1) state that wasn't merged to have a conflict either in LALR(1) or in SLR(1) automaton?

Solution

Interesting question! The answer is

if a grammar is LR(1), any conflicts in the LALR(1) parser must occur in merged states, but
if a grammar is LR(1), conflicts may appear in LR(1) states that were not merged.

For the first point, suppose you have a grammar that’s LR(1), so you can form its LR(1) parser. We can convert that to an LALR(1) parser by merging together all states with the same productions, ignoring lookaheads. If you have an LR(1) state that doesn’t get merged with anything, then that LR(1) state is present verbatim in the LALR(1) parser. And since the LR(1) state has no shift/reduce or reduce/reduce conflicts, the corresponding LALR(1) parser state won’t have any conflicts.

On the SLR(1) front, you can end up with states where no LR(1) state merging would occur, yet there's a reduce/reduce conflict. The intuition behind this is that you can have a state with no reduce/reduce conflicts in the LR(1) parser because the lookaheads have enough detail to resolve the conflict, yet when switching from LR(1) to SLR(1) and expanding the lookahead sets we accidentally introduce a reduce/reduce conflict. Here's an example of a grammar where this happens:

S → aTb | aR | cT
T → d
R → d

Here's the LR(1) configurating sets:

(1)
   S' -> .S    [$]
   S  -> .aTb  [$]
   S  -> .aR   [$]
   S  -> .cT   [$]

(2)
   S' -> S.    [$]

(3)
   S -> a.Tb   [$]
   S -> a.R    [$]
   T -> .d     [b]
   R -> .d     [$]

(4)
   T -> d.     [b]
   R -> d.     [$]

(5)
   S -> aT.b   [$]

(6)
   S -> aTb.   [$]

(7)
   S -> aR.    [$]

(8)
   S -> c.T    [$]
   T -> .d     [$]

(9)
   T -> d.     [$]
   
(10)
   S -> cT.    [$]

These are the same item sets that you'd have in the SLR(1) parser. Notice, also, that FOLLOW(T) = {$, b}. This means that the LR(1) state

(4)
   T -> d.     [b]
   R -> d.     [$]

is converted to the SLR(1) state

(4)
   T -> d.     [b, $]
   R -> d.     [$]

which has a reduce/reduce conflict on $.