javascriptbisoncontext-free-grammarjison

How to make a chained comparison in Jison (or Bison)


I'm working on a expression parser made in Jison, which supports basic things like arithmetics, comparisons etc. I want to allow chained comparisons like 1 < a < 10 and x == y != z. I've already implemented the logic needed to compare multiple values, but I'm strugling with the grammar – Jison keeps grouping the comparisons like (1 < a) < 10 or x == (y != z) and I can't make it recognize the whole thing as one relation.

This is roughly the grammar I have:

expressions = e EOF

e = Number
  | e + e
  | e - e
  | Relation  %prec '=='
  | ...

Relation = e RelationalOperator Relation  %prec 'CHAINED'
  | e RelationalOperator Relation         %prec 'NONCHAINED'

RelationalOperator = '==' | '!=' | ...

(Sorry, I don't know the actual Bison syntax, I use JSON. Here's the entire source.)

The operator precedence is roughly: NONCHAINED, ==, CHAINED, + and -.

I have an action set up on e → Relation, so I need that Relation to match the whole chained comparison, not only a part of it. I tried many things, including tweaking the precedence and changing the right-recursive e RelationalOperator Relation to a left-recursive Relation RelationalOperator e, but nothing worked so far. Either the parser matches only the smallest Relation possible, or it warns me that the grammar is ambiguous.


If you decided to experiment with the program, cloning it and running these commands will get you started:

git checkout develop
yarn
yarn test

Solution

  • There are basically two relatively easy solutions to this problem:

    1. Use a cascading grammar instead of precedence declarations.

      This makes it relatively easy to write a grammar for chained comparison, and does not really complicate the grammar for binary operators nor for tight-binding unary operators.

      You'll find examples of cascading grammars all over the place, including most programming languages. A reasonably complete example is seen in this grammar for C expressions (just look at the grammar up to constant_expression:).

      One of the advantages of cascading grammars is that they let you group operators at the same precedence level into a single non-terminal, as you try to do with comparison operators and as the linked C grammar does with assignment operators. That doesn't work with precedence declarations because precedence can't "see through" a unit production; the actual token has to be visibly part of the rule with declared precedence.

      Another advantage is that if you have specific parsing needs for chained operators, you can just write the rule for the chained operators accordingly; you don't have to worry about it interfering with the rest of the grammar.

      However, cascading grammars don't really get unary operators right, unless the unary operators are all at the top of the precedence hierarchy. This can be seen in Python, which uses a cascading grammar and has several unary operators low in the precedence hierarchy, such as the not operator, leading to the following oddity:

      >>> if False == not True: print("All is well")
        File "<stdin>", line 1
          if False == not True: print("All is well")
                      ^
      SyntaxError: invalid syntax
      

      That's a syntax error because == has higher precedence than not. The cascading grammar only allows an expression to appear as the operand of an operator with lower precedence than any operator in the expression, which means that the expression not True cannot be the operand of ==. (The precedence ordering allows not a == b to be grouped as not (a == b).) That prohibition is arguably ridiculous, since there is no other possible interpretation of False == not True other than False == (not True), and the fact that the precedence ordering forbids the only possible interpretation makes the only possible interpretation a syntax error. This doesn't happen with precedence declarations, because the precedence declaration is only used if there is more than one possible parse (that is, if there is really an ambiguity).

      Your grammar puts not at the top of the precedence hierarchy, although it should really share that level with unary minus rather than being above unary minus [Note 1]. So that's not an impediment to using a cascading grammar. However, I see that you also want to implement an if … then … else operator, which is syntactically a low-precedence prefix operator. So if you wanted 4 + if x then 0 else 1 to have the value 5 when x is false (rather than being a syntax error), the cascading grammar would be problematic. You might not care about this, and if you don't, that's probably the way to go.

    2. Stick with precedence declarations and handle the chained comparison as an exception in the semantic action.

      This will allow the simplest possible grammar, but it will complicate your actions a bit. To implement it, you'll want to implement the comparison operators as left-associative, and then you'll need to be able to distinguish in the semantic actions between a comparison (which is a list of expressions and comparison operators) from any other expression (which is a string). The semantic action for a comparison operator needs to either extend or create the list, depending on whether the left-hand operand is a list or a string. The semantic action for any other operator (including parenthetic grouping) and for the right-hand operand in a comparison needs to check if it has received a list, and if so compile it into a string.

    Whichever of those two options you choose, you'll probably want to fix the various precedence errors in the existing grammar, some of which were already present in your upstream source (like the unary minus / not confusion mentioned above). These include:


    Notes

    1. If you used a cascading precedence grammar, you'd see that only one of - not 0 and not - 0 is parseable. Of course, both are probably type violations, but that's not something the syntax should concern itself with.