algorithmparsingll-grammar

Why am I getting different results from the Dragon Book when constructing LL(1) table?


Page 224, Algorithm 4.31.

The method for constructing an LL(1) table says:

For each production A -> alpha of the grammar, do the following.

  1. For each terminal a in FIRST(A), add A -> alpha to M[A, a].

  2. If eps is in FIRST(alpha), then for each terminal b in FOLLOW(A) add A -> alpha to M[A, b]. If eps is in FIRST(alpha) and $ is in FOLLOW(A), add A -> alpha to M[A, $] as well.

They give the following grammar and the table.

(1)  E  ->  T  E'
(2)  E' ->  +  T E'
(3)  E' ->  eps
(4)  T  ->   F  T'
(5)  T' ->  *  F  T'
(6)  T' ->  eps
(7)  F  ->  (  E  )
(8)  F  ->  id
   | id    +    *    (    )    $
---+-----------------------------
E  | 1               1
E' |       2              3    3
T  | 4               4
T' |       6    5         6    6      
F  | 8               7

For reference, there are the FIRST/FOLLOW sets; taken from the book.

     FIRST    FOLLOW
E    ( id     ) $
E'   + eps    ) $
T    ( id     + ) $
T'   * eps    + ) $
F    ( id     + * ) $

When I run the algorithm (both by hand and using an implementation that I wrote based on the book), I get the following.

   | id    +    *    (    )    $
---+-----------------------------
E  | 1               1
E' |       2|3            3    3
T  | 4               4
T' |       5|6  5         6    6      
F  | 7|8             7|8

The difference is that I have some conflicts that the book avoids.

Here's my work, going through the algorithm. The first two steps are apparently the same as theirs, but we part ways on the third step. The other conflicts in my table appear for the same reason.

Production (1) E -> T E'

  1. For each terminal x in FIRST(E) = { (, id }, add (1) to M[E, x].
   | id    +    *    (    )    $
---+-----------------------------
E  | 1               1
E' |
T  |
T' |
F  |
  1. If eps is in FIRST(T E') = { (, id }... it's not.

Production (2) E' -> + T E'

  1. For each terminal x in FIRST(E') = { +, eps }, add (2) to M[E', x].
   | id    +    *    (    )    $
---+-----------------------------
E  | 1               1
E' |       2
T  |
T' |
F  |
  1. If eps is in FIRST(+ T E') = { + }... it's not.

Production (3) E' -> eps

  1. For each terminal x in FIRST(E') = { +, eps }, add (3) to M[E', x].
   | id    +    *    (    )    $
---+-----------------------------
E  | 1               1
E' |       2|3
T  |
T' |
F  |

And here's the conflict.

The books says:

Since FOLLOW(E') = { ), $ }, production (3) E' -> eps is added to M[E', )] and M[E', $].

And we agree on this one: that's what I'd do next as well. But before that, the first step says:

  1. For each terminal a in FIRST(A), add A -> alpha to M[A, a].

Applied to this rule, FIRST(E') is { +, eps }. So M[E', +] should get (3) E' -> eps.

You can see the same pattern in the next two rows. And in the last row, productions (7) F -> ( E ) and (8) F -> id will both have the same FIRST(F) = { (, id }, and thus M[F, id] and M[F, (] will both get (7) and (8).


What's implied in the algorithm given in the book that I'm missing?

Also, here's a log of what my implementation gives. I have the same reasoning when I do the algorithm by hand, so please do not ask for code. I am looking for a mistake in my interpretation of the algorithm, not the implementation.

#########
## Log ##
#########
Looking at rule Expression -> Term Expression_.
  Adding "Expression -> Term Expression_" to M[Expression, OpenParen] due to step 1.
  Adding "Expression -> Term Expression_" to M[Expression, Identifier] due to step 1.
Looking at rule Expression_ -> Plus Term Expression_.
  Adding "Expression_ -> Plus Term Expression_" to M[Expression_, Plus] due to step 1.
Looking at rule Expression_ -> ε.
  Adding "Expression_ -> ε" to M[Expression_, Plus] due to step 1.
  Adding "Expression_ -> ε" to M[Expression_, CloseParen] due to step 2a.
  Adding "Expression_ -> ε" to M[Expression_, $] due to step 2b.
Looking at rule Term -> Factor Term_.
  Adding "Term -> Factor Term_" to M[Term, OpenParen] due to step 1.
  Adding "Term -> Factor Term_" to M[Term, Identifier] due to step 1.
Looking at rule Term_ -> Times Factor Term_.
  Adding "Term_ -> Times Factor Term_" to M[Term_, Times] due to step 1.
Looking at rule Term_ -> ε.
  Adding "Term_ -> ε" to M[Term_, Times] due to step 1.
  Adding "Term_ -> ε" to M[Term_, Plus] due to step 2a.
  Adding "Term_ -> ε" to M[Term_, CloseParen] due to step 2a.
  Adding "Term_ -> ε" to M[Term_, $] due to step 2b.
Looking at rule Factor -> OpenParen Expression CloseParen.
  Adding "Factor -> OpenParen Expression CloseParen" to M[Factor, OpenParen] due to step 1.
  Adding "Factor -> OpenParen Expression CloseParen" to M[Factor, Identifier] due to step 1.
Looking at rule Factor -> Identifier.
  Adding "Factor -> Identifier" to M[Factor, OpenParen] due to step 1.
  Adding "Factor -> Identifier" to M[Factor, Identifier] due to step 1.


#################
## Final table ##
#################
Expression + OpenParen => { Expression -> Term Expression_ }
Expression + Identifier => { Expression -> Term Expression_ }
Expression_ + Plus => { Expression_ -> Plus Term Expression_ , Expression_ -> ε }
Expression_ + CloseParen => { Expression_ -> ε }
Expression_ + Symbol($) => { Expression_ -> ε }
Term + OpenParen => { Term -> Factor Term_ }
Term + Identifier => { Term -> Factor Term_ }
Term_ + Times => { Term_ -> Times Factor Term_ , Term_ -> ε }
Term_ + Plus => { Term_ -> ε }
Term_ + CloseParen => { Term_ -> ε }
Term_ + Symbol($) => { Term_ -> ε }
Factor + OpenParen => { Factor -> OpenParen Expression CloseParen , Factor -> Identifier }
Factor + Identifier => { Factor -> OpenParen Expression CloseParen , Factor -> Identifier }

Solution

  • This looks like an error in the book.

    For each terminal a in FIRST(A), add A -> alpha to M[A, a].

    There should be FIRST(alpha), not FIRST(A). Indeed, if you see F -> (E), you need to add this rule to M[F, '('], and if you see F -> id, add this to M[F,id]. It makes no sense to add both rules to both states.


    Here's the fixed algorithm from the book (emphasis on the changed part):

    For each production A -> alpha of the grammar, do the following.

    1. For each terminal a in FIRST(alpha), add A -> alpha to M[A, a].
    2. If eps is in FIRST(alpha), then for each terminal b in FOLLOW(A) add A -> alpha to M[A, b]. If eps is in FIRST(alpha) and $ is in FOLLOW(A), add A -> alpha to M[A, $] as well.

    Here's the correct order for creating the table:

    Looking at rule Expression -> Term Expression_.
      Adding "Expression -> Term Expression_" to M[Expression, OpenParen] due to step 1.
      Adding "Expression -> Term Expression_" to M[Expression, Identifier] due to step 1.
    Looking at rule Expression_ -> Plus Term Expression_.
      Adding "Expression_ -> Plus Term Expression_" to M[Expression_, Plus] due to step 1.
    Looking at rule Expression_ -> ε.
      Adding "Expression_ -> ε" to M[Expression_, CloseParen] due to step 2a.
      Adding "Expression_ -> ε" to M[Expression_, $] due to step 2b.
    Looking at rule Term -> Factor Term_.
      Adding "Term -> Factor Term_" to M[Term, OpenParen] due to step 1.
      Adding "Term -> Factor Term_" to M[Term, Identifier] due to step 1.
    Looking at rule Term_ -> Times Factor Term_.
      Adding "Term_ -> Times Factor Term_" to M[Term_, Times] due to step 1.
    Looking at rule Term_ -> ε.
      Adding "Term_ -> ε" to M[Term_, Plus] due to step 2a.
      Adding "Term_ -> ε" to M[Term_, CloseParen] due to step 2a.
      Adding "Term_ -> ε" to M[Term_, $] due to step 2b.
    Looking at rule Factor -> OpenParen Expression CloseParen.
      Adding "Factor -> OpenParen Expression CloseParen" to M[Factor, OpenParen] due to step 1.
    Looking at rule Factor -> Identifier.
      Adding "Factor -> Identifier" to M[Factor, Identifier] due to step 1.