I'm trying to implement an LR1 Parser, using the help in this wikipedia article. I have made a closure function for the LR0 Items, and when trying to make a closure for the LR1 items (the part about adding lookaheads to the LR0 Items) I need to compute the FIRST and FOLLOW sets, however the wiki refers to 2 FOLLOW sets, one that takes in an LR0 item, and another that only takes in an itemset and a nonterminal. The second one is easy to understand:
pub fn lr0_follow_in_item_set(
rules: &[Rule],
item_set: &[LR0Item],
nt: &NonTerminal,
) -> BTreeSet<Terminal> {
let mut res = BTreeSet::new();
for item in item_set { //
if Some(TerminalOrNonTerminal::NonTerminal(*nt)) == item.next_symbol(rules) { // item in item set where dot is before the desired non terminal
res.extend(lr0_follow(rules, item)); // union of the follow sets
}
}
res
}
However I cannot find an implementation of FOLLOW for an LR0 item (lr0_follow(rules, item)
), is it the same as FOLLOW(nonterminal), where nonterminal is the nonterminal after dot of item (if any otherwise return empty set)?
NOTE: My LR1 parser doesn't use epsilons.
I don't know what to try, as I couldn't find any resources for this specific FOLLOW set.
You are more or less correct -- with LR(0), you don't worry about lookaheads at all, so when you would need a lookahead, you instead look at FOLLOW(nonterminal)
To put things in the terms of the wikipedia article you linked1, when you have an item "I: A → α ·B β, x", with LR(0), x (the lookahead) is always empty. So when computing the FOLLOW(I), instead of using x when FIRST(β) contains epsilon, you use FOLLOW(A) from the grammar.
So the equation you end up with is
if ε ∈ FIRST(β)
FOLLOW(I) = (FIRST(β) - {ε}) ∪ FOLLOW(A)
else
FOLLOW(I) = FIRST(β)
1You need to be somewhat careful with the details on that wiki page -- when they show "the full item set 0 for LR(1)", they actually have the full item set for LALR(1), without really talking about what the difference between LR(1) and LALR(1) is