parsinglalr

Why is my parser generating reporting that this LALR(1) grammar is not LALR(1)?


OK, my adventures with my parser generator continued. This time I got my hands on a classic grammar, which is said to be a LALR grammar:

start -> a a
a -> "A" a
a -> "B"

When I put it into my parser generator it gives me that output:

LIST OF STATES:
-----------------------
<0: S' -> . start , $
start -> . a a , $
a -> . "A" a , "A" / "B"
a -> . "B" , "A" / "B"
{NTerm(start): 1, Term(A): 2, Term(B): 3, NTerm(a): 4}>(3104365621624877555)

--------------------
<1: S' -> start .  , $
{}>(3969511602615904846)

--------------------
<2: a -> "A" . a , "A" / "B"
a -> . "A" a , "A" / "B"
a -> . "B" , "A" / "B"
{Term(A): 2, Term(B): 3, NTerm(a): 5}>(5490562805113673592)

--------------------
<3: a -> "B" .  , "A" / "B"
{}>(-4845209343945471034)

--------------------
<4: start -> a . a , $
a -> . "A" a , $
a -> . "B" , $
{Term(A): 6, Term(B): 7, NTerm(a): 8}>(598157158659875896)

--------------------
<5: a -> "A" a .  , "A" / "B"
{}>(436327415052220213)

--------------------
<6: a -> "A" . a , $
a -> . "A" a , $
a -> . "B" , $
{Term(A): 6, Term(B): 7, NTerm(a): 9}>(5490562805113673592)

--------------------
<7: a -> "B" .  , $
{}>(-4845209343945471034)

--------------------
<8: start -> a a .  , $
{}>(5795088700656730485)

--------------------
<9: a -> "A" a .  , $
{}>(436327415052220213)

POSSIBLE STATES TO JOIN: (2, 6), (3, 7), (5, 9)
ATTEMPTING CONVERSION TO LALR GRAMMAR...FAILED
CONTINUING WITH CLR(1)...

These states match what I can read in other sources about compiling LALR grammars — this step looks OK, it produces correct states like if I had done it by hand. The generator suggest — again it is what other sources say about conversing CLR(1) grammar into LALR — that states (2,6),(3,7),(5,9) can be joined, but it is unable to do so. When I look at produced action and goto tables I can see why:

Resulting action and goto table

As you can see states 2 and 6 can't be joined, because there are incompatible items s2 <> s6, s3 <> s7 and so on.

But what amazes me the most is the generator finished its work and produced a running program. When I run this program on test data, it accepts the data! So, my generator produced the correct tables.

Does it mean this classic "LALR" grammar is LALR only when a human does the compilation by hand? What my parser generator does differently?


Solution

  • I think the issue is here:

    As you can see states 2 and 6 can't be joined, because there are incompatible items s2 <> s6, s3 <> s7 and so on.

    That's actually not quite right. Notice that in state 2, the shift items take you to, respectively, states 2 and 3. In state 6, the shift items take you to, respectively, states 6 and 7. But there's no incompatibility there, because you're attempting to merge states 2 and 6 together and to merge states 3 and 7 together. In other words, yes, those shifts do currently say to go to different places, but after you collapse all states with the same core together, you'll end up with them all saying to go to the same place.

    More generally, I don't believe there's any case where merging two LR(1) states will introduce a "shift/shift" conflict. To see why this is, note that each LR(1) state corresponds to a state in an LR(0) parser, except that each LR item has been annotated with a set of lookahead items. In going from LR(1) to LALR(1), you're combining together LR(1) states with the same items ignoring lookaheads, meaning that you're essentially combining together LR(1) states that correspond to the same LR(0) state.

    As a result, if you have an LR(1) state S that says "go to state T on symbol a", then the LR(0) state corresponding to S has a transition to the LR(0) state corresponding to T, and that's also going to be true for any LR(1) state that S is going to be merged with.

    The only conflicts that can arise in the course of building an LALR(1) parser are reduce/reduce conflicts, which is what you need to be on guard for.