context-free-grammarformal-languages

Computing leading and trailing sets for context-free grammar


I am seeking for a detailed algorithm describing how to generate leading and trailing sets for non-terminal symbols in context-free grammars.

I've found something like this: https://pl.scribd.com/doc/51358638/16/Operator-Precedence-Relations but I'm unsure about how it works. (See page 20)

Assume, that we have productions:

A -> YaZ | B

B -> b

then, it is said, that Leading(A) = {a}, Leading(A) = Leading(B) and Leading(B)={b}. And I have my doubts about this:

  1. Does it mean, that Leading(A) = {a, b} ? - I assume that in every step we do not replace the set already generated because of '=', but make sum of two sets.
  2. Does it mean, that Leading(B) is {b} or {a, b}? - Does the third rule mean, that we add Leading(B) to Leading(A) or that Leading(A) and Leading(B) are equivalent?

Solution

  • Leading and Trailing are functions specific to generating an operator-precedence parser, which is only applicable if you have an operator precedence grammar. An operator precedence grammar is a special case of an operator grammar, and an operator grammar has the important property that no production has two consecutive non-terminals.

    (An operator precedence grammar is, loosely speaking, an operator grammar which can be parsed with an operator precedence parser :-). But that's not important for now.)

    Given an operator grammar, the function Leading (resp. Trailing) of a non-terminal produces the set of terminals which could be (recursively) the first (resp. last) terminal in some sentential form derived from that non-terminal.

    Another possibly more intuitive definition is that a terminal is in the Leading set for a non-terminal if the terminal is "visible" from the beginning of a production. Non-terminals are "transparent", so a terminal could be visible either by looking through a leading non-terminal or by looking into a visible non-terminal.

    For example, a standard expression grammar (which is an operator grammar; no production has two consecutive non-terminals):

    expr   -> factor '*' expr
    expr   -> factor
    factor -> term '+' factor
    factor -> term
    term   -> ID
    term   -> '(' expr ')'
    

    From term, ID and ( are visible from the beginning, and ID and ) are visible from the end. expr is not visible from either side, because it is hidden by terminals, so we don't need to consider it.

    From factor, + is visible from both ends, and factor also inherits the Leading and Trailing sets of term because term is visible from both ends. (factor is also visible from the end in itself, but that cannot add anything new to the Trailing set.)

    Finally, from expr, * is visible from both ends, and expr inherits from factor.

    So we end up with:

    Non-terminal            Leading             Trailing
    expr                    *, +, ID, (         *, +, ID, )
    factor                  +, ID, (            +, ID, )
    term                    ID, (               ID, )
    

    From that, we're going to construct precedence relations. Basically, if you find

    nonterminal TERMINAL
    

    in any production, then you add the precedence relations TRAIL ⋗ TERMINAL for every TRAIL in Trailing(nonterminal). Similarly, every occurrence of

    TERMINAL nonterminal
    

    generates the relationships TERMINAL ⋖ LEAD for every LEAD in Leading(nonterminal). And finally, if you find

    TERMINAL1 TERMINAL2
    

    or

    TERMINAL1 nonterminal TERMINAL2
    

    then you generate the relationship TERMINAL1 ·=· TERMINAL2.

    Once you've generated all the precedence relationships, you look at every pair of terminals T, U. If at most one precedence relation holds -- that is, T ⋖ U, T ⋗ U, T ·=· U or there is no relationship from T to U -- then you have an operator precedence grammar. (There is no connection between T, U and U, T. Precedence relationships are not antisymmetric and it is unfortunate that they are traditionally spelled with symbols that look like numeric comparison.)