I am having trouble understanding what the rules are for adding a lookahead to a core production during the construction of the DFA. To illustrate my confusion, I will be using an online parser generator that exposes all the internal calculations; this_tool. (<- open in a new tab)
(The formating is: NONTERMINAL -> RULE, LOOKAHEADS
, where the lookaheads are forward slash sperated)
Using this grammar as an example:
S -> E
E -> ( E )
E -> N O E
E -> N
N -> 1
N -> 2
N -> 3
O -> +
O -> -
Copy and pasting the above grammar into the lalr parser generator will produce a dfa with 12 states (click the >>
). My question is finally, why are the goto(0, N)
kernel productions ( {[E -> N.O E, $/)]; [E -> N., $/)]}
) initiated with the )
terminal? Where does the )
come from? I would expect the goto(0, N)
to be {[E -> N.O E, $]; [E -> N., $]}
. Equally the kernel production in the goto(0, ( )
has an 'extra' )
.
As the dfa is being constructed, equal cores are merged (the core is the set of productions that introduce a new state by performing closure on that set). State 2 has production [E -> .N, )];
, which when merged with [E -> N., $]
produces the correct output, but there's no way for state 0 to have known about lookahead of )
Thanks in advance, sorry if this was a confusing and specific question and about using an external website to demonstrate my issue.✌️
The solution is to propagate any newly found lookaheads then 'goto' the states where those lookaheads are cores of.
The method is described in chapter 4 section 7.5 of the Dragon Book 2nd ed. (here: https://github.com/muthukumarse/books/blob/master/Dragon%20Book%20Compilers%20Principle%20Techniques%20and%20Tools%202nd%20Edtion.pdf)