parsinggrammaroperator-precedenceformal-languagestla+

What algorithms exist for parsing a language where operator precedence is defined as a range?


The language TLA+ uses ranges for its operator precedence (see the table on page 271 in the book Specifying Systems [PDF]). Quote:

The relative precedence of two operators is unspecified if their ranges overlap.

So for example the $ operator (precedence 9-13) conflicts with the + operator (precedence 10-10) but not the < operator (precedence 5-5).

Are operator precedence ranges a commonly-used or even pre-existing concept in formal languages? I can't find anything about this online from some searching. Are there algorithms for parsing such a precedence scheme, or translating this scheme into a standard single-value precedence scheme? Do any parser generators handle operator precedence ranges? What is gained through this approach compared to single-value precedence levels?


Solution

  • Are operator precedence ranges a commonly-used or even pre-existing concept in formal languages?

    No.

    In fact, operator precedence (without ranges) is a seriously misunderstood concept in formal languages. (When I finish writing this treatise, you'll be able to see below for a longer version of this statement.)

    Are there algorithms for parsing such a precedence scheme, or translating this scheme into a standard single-value precedence scheme?

    You could adapt the standard shunting yard algorithm pretty easily, by just using a more complicated precedence comparison. In the standard algorithm, comparison can have three possible outcomes: less than (push the incoming symbol), greater than (reduce the expression on the left), and equal (combine parentheses or other bracketing tokens). (The last outcome is not well described in most online resources, and is often clouded by adding an additional test for associativity. Again, see below.)

    With range comparison, you would have four possible results. You would compare the end of the left-hand range with the beginning of the right-hand range for less-than and the beginning of the left-hand range with the end of the right-hand range for greater-than. If neither of these applied you would have overlapping ranges, and you would have to somehow distinguish bracketing tokens from the error case.

    Do any parser generators handle operator precedence ranges?

    Only a few parser generators use operator precedence tables these days, although hand-written implementations of the shunting yard algorithm (or its equivalent) are pretty common. Parser generators like yacc/bison use the precedence declarations at parser construction time to resolve conflicts by eliminating some possible parsing actions; the parser itself has no idea about precedence and operates strictly according to a state transition table.

    ANTLR is the parser generator which comes the closest to using precedence during the parse. It computes its own precedence numbers (separately for each non-terminal) based on the order of productions in the grammar description. These computed precedence levels are translated into semantic predicates which are executed at parse time in order to select between different parse predictions. Since there are no explicit precedence declarations, you wouldn't have anywhere to declare a range, but you could write the semantic predicates yourself instead of letting ANTLR generate them for you (if you have explicit predicates, it doesn't attempt to insert conflict resolution code). That would allow you to use the same strategy as I outlined above, but I don't think it would be totally satisfactory.

    What is gained through this approach compared to single-value precedence levels?

    In my opinion, nothing. It's a hack, like most of the rest of precedence parsing. If it works for a particular grammar, that's cool, but it will suffer from the same problems which generalised operator precedence parsing suffers from: you don't really know what language you are parsing because the whole apparatus is more like a heuristic than an formal framework. As demonstration of that, it's hard to find a hand-written precedence-based parser which doesn't at some point have some (hopefully commented) code which handles the exceptions not correctly handled by precedence comparisons, just as it is hard to find a real-life grammar using precedence declarations in any but the most vanilla style which doesn't at some place have a long comment explaining an otherwise mysterious declaration.