parsinglr-grammarlr1

LR(1) item sets for left recursive grammar


I read several papers about creating LR(1) item sets, but none of them pertained to left recursive grammars, such as those used for parsing expressions. If I had the following grammar,

E -> E + T | T
T -> T * F | F
F -> (E) | NUMBER

How would I go about creating the LR(1) item sets?


Solution

  • Left recursion isn't inherently a problem for LR(1) parsers, and the rules for determining the configurating sets and lookaheads is the same regardless of whether your grammar has left recursion.

    In your case, we begin by augmenting the grammar with our new start symbol:

    S -> E
    E -> E + T | T
    T -> T * F | F
    F -> (E) | NUMBER
    

    Our initial configurating set corresponds to looking at the production S -> E with a lookahead of $. Initially, that gives us the following:

    (1)
        S -> .E     [$]
    

    We now need to expand out what E could be. That gives us these new items:

    (1)
        S -> .E     [$]
        E -> .E + T [$]
        E -> .T     [$]
    

    Now, let's look at the item E -> .E + T [$]. We need to expand out what E could be here, and the rules for doing so are the same as in the non-left-recursive case: we list all productions for E with the dot at the front, with a lookahead given by what follows the E in the production E -> .E + T [$]. In this case we're looking for an E with a lookahead of + because that's what follows is in the production. That adds these items:

    (1)
        S -> .E     [$]
        E -> .E + T [$]
        E -> .T     [$]
        E -> .E + T [+]
        E -> .T     [+]
    

    From here, we expand out all the cases where there's a dot before a T, which gives the following:

    (1)
        S -> .E     [$]
        E -> .E + T [$]
        E -> .T     [$]
        E -> .E + T [+]
        E -> .T     [+]
        T -> .T * F [$]
        T -> .F     [$]
        T -> .T * F [+]
        T -> .F     [+]
    

    We now have to expand out the Ts in the context of T -> .T * F [$], and we do so by listing all productions of T followed by what the T is followed by in T -> .T * F [$] (namely, *). That gives us the following:

    (1)
        S -> .E     [$]
        E -> .E + T [$]
        E -> .T     [$]
        E -> .E + T [+]
        E -> .T     [+]
        T -> .T * F [$]
        T -> .F     [$]
        T -> .T * F [+]
        T -> .F     [+]
        T -> .T * F [*]
        T -> .F     [*]
    

    And from here we'd expand out the productions that have a dot before the F. Do you see how to do that based on how things have played out so far?