parsingrustcompiler-constructionlalrlr1

How to find the FOLLOW set of an LR0 Item?


I'm trying to implement an LR1 Parser, using the help in this wikipedia article. I have made a closure function for the LR0 Items, and when trying to make a closure for the LR1 items (the part about adding lookaheads to the LR0 Items) I need to compute the FIRST and FOLLOW sets, however the wiki refers to 2 FOLLOW sets, one that takes in an LR0 item, and another that only takes in an itemset and a nonterminal. The second one is easy to understand:

pub fn lr0_follow_in_item_set(
    rules: &[Rule],
    item_set: &[LR0Item],
    nt: &NonTerminal,
) -> BTreeSet<Terminal> {
    let mut res = BTreeSet::new();
    for item in item_set { // 
        if Some(TerminalOrNonTerminal::NonTerminal(*nt)) == item.next_symbol(rules) { // item in item set where dot is before the desired non terminal
            res.extend(lr0_follow(rules, item)); // union of the follow sets
        }
    }
    res
}

However I cannot find an implementation of FOLLOW for an LR0 item (lr0_follow(rules, item)), is it the same as FOLLOW(nonterminal), where nonterminal is the nonterminal after dot of item (if any otherwise return empty set)?

NOTE: My LR1 parser doesn't use epsilons.

I don't know what to try, as I couldn't find any resources for this specific FOLLOW set.


Solution

  • You are more or less correct -- with LR(0), you don't worry about lookaheads at all, so when you would need a lookahead, you instead look at FOLLOW(nonterminal)

    To put things in the terms of the wikipedia article you linked1, when you have an item "I: A → α ·B β, x", with LR(0), x (the lookahead) is always empty. So when computing the FOLLOW(I), instead of using x when FIRST(β) contains epsilon, you use FOLLOW(A) from the grammar.

    So the equation you end up with is

    if ε ∈ FIRST(β)
      FOLLOW(I) = (FIRST(β) - {ε}) ∪ FOLLOW(A)
    else
      FOLLOW(I) = FIRST(β)


    1You need to be somewhat careful with the details on that wiki page -- when they show "the full item set 0 for LR(1)", they actually have the full item set for LALR(1), without really talking about what the difference between LR(1) and LALR(1) is