parsingcompilationcompiler-constructionlr-grammarlalr

Can LR(1) parsing table size be the same as LALR(1) parsing table for some grammar?


I know that LALR(1) grammars are a subset of LR(1) grammars and most of the time LALR(1) parsing table is much smaller than LR(1) parsing table for the same grammar. But I couldn't find the answer on the internet and SO whether a grammar exists for which they have the same parsing table size. It sounds possible, since LALR is essentially "collapsing" compatibile states, merging those that do not conflict, but is there a grammar both LR(1) and LALR(1) with same parsing table size for both?


Solution

  • Intuitively, an LALR(1) parser is formed by starting with an LR(1) parser and repeatedly merging states together when those states are identical with the exception of lookaheads. Therefore, an LR(1) and LALR(1) parser for a grammar will be the same whenever there aren't any states of this sort to merge together.

    In that case, the parsing tables for the two parsers will be completely identical. The GOTO tables will be the same because the two parsers have the same states and same transitions, and the ACTION tables will be the same because the shift and reduce items in each state are the same.

    I believe that the specific requirement that has to hold for the LALR(1) and LR(1) automata to be the same is that the grammar's LR(0) and LR(1) parsers must be identical to one another, ignoring lookaheads. Specifically:

    So in other words, yes, this is possible. More precisely:

    Theorem: A grammar G has identical LALR(1) and LR(1) parsers if and only if it has identical LR(0) and LR(1) parsers, ignoring lookaheads.

    This means that any grammar that has this property must be LR(0), in which case you wouldn't want to use an LALR(1) parser. However, not all LR(0) grammars have this property. For example, consider this grammar:

    S -> aTbT
    T -> c
    

    Here's the LR(1) parser:

    (1)
    S' -> .S    [$]
    S  -> .aTbT [$]
    
    (2)
    S  -> a.TbT [$]
    T  -> .c    [b]
    
    (3)
    T ->  c.    [b]
    
    (4)
    S -> aT.bT  [$]
    
    (5)
    S -> aTb.T  [$]
    T -> .c     [$]
    
    (6)
    T ->  c.    [$]
    
    (7)
    S -> aTbT.  [$]
    
    (8)
    S' -> S     [$]
    

    Here, states (3) and (6) would be combined together in an LALR(1) parser, so the LR(1) and LALR(1) parsers aren't the same. However, you can check that this grammar is indeed LR(0), showing that only a subset of the LR(0) grammars have this property.