parsinglr-grammarlanguage-implementation

Precedence/associativity implmentation


I'm currently implementing a LR(k) parser interpreter, just for fun.

I'm trying to implement the precedence and associativity.

I got a little stuck when it came to how to assign associativity and precedence for the 'action' part i.e. what the precedence and associativity should be for the reduction.

if we got a production

E -> 
  | E + E { action1 }
  | E * E { action2 } 
  | (E)   { action3 }
  | ID    { action4 }

it should be quit clear that action1 should have the same associativity and precedence as + and action2 should have the same as *. But in general we can not just assume that a rule in a production has only one symbol which has a precedence. A toy example

E -> E + E - E { action }

where - and + are some arbetrary operators, having some precedence and associativity. Should the action be associated with -, because it precedes the last E?

I know the rules for how to choose between shift/reduce, that is not what I ask for.


Solution

  • The classic precedence algorithm, as implemented by yacc (and many derivatives) uses the last non-terminal in each production to define its default precedence. That's not always the desired precedence for the production, so parser-generators typically also provide their users with a mechanism for explicitly specifying the precedence of a production.

    This precedence model has proven to be useful, and while it is not without its problems -- see below -- it is probably the best implementation for a simple parser generator, if only because its quirks are at least documented.

    This convention has perpetuated the idea that precedence is a feature of non-terminals (or "operators"). That's valid if you're building an operator-precedence parser but it does not correspond to LR(k) parsing. At best, it's a crude approximation, which can be highly misleading.

    If the underlying grammar really is an operator precedence grammar -- that is, no production has two consecutive terminals and the imputed precedence relationships are unambiguous -- then it might be an acceptable approximation, although it's worth noting that operator precedence relationships are not transitive so they cannot usually be summarised as monotonic comparisons. But many uses of yacc-style precedence are well outside of this envelope, and can even lead to serious grammar bugs.

    The problem is that modelling precedence as a simple transitive comparison between tokens can lead to the precedence declarations being used to disambiguate (and thereby hide) unrelated conflicts. In short, the use of precedence declarations in LR parsing is basically a hack. It's a useful hack, and sometimes beneficial -- as you say, it can reduce the number of states and the frequency of unit reductions -- but it needs to be approached with caution.

    Indeed, some people have proposed an alternative model of precedence based on grammar rewriting. (See, for example, the 2013 paper by Ali Afroozeh et al., “Safe Specification of Operator Precedence Rules”). This model is considerably more precise, but partially as a consequence of this precision, it is not as amenable to (mis)-use for other purposes, such as the resolution of the dangling-else conflict.