pythonparsingcompiler-errorsbinary-operators

Python fails to bind not operator to unary operand


In Python, if a 'not' operator follows a bitwise operator (such as '&' or '|') the result is a syntax error. Granted that it will be a bitwise operation on a binary value, but that should be OK. There is no issue in C as far as I recall.

For example, this works:

a = 0
b = 1
anot = not(a)
bnot = not(b)
c = anot | bnot

but this fails:

c = not(a) | not(b)

these work:

c = not(a) | (not(b))   
c = not a | (not b)  

Can anyone give me insight as to why this should be? Not looking for workarounds, just an explanation of the implementation. In the meantime, I will struggle through the source code and CFGs to see if I can learn more. So far, I have not found any similar question on the Stacks or other Googles. Thanks!


Solution

  • The Python grammar does clearly indicate what's going on: (I edited out the long list of different comparison operators, which are all the same except for the non-terminal name and the operator itself)

    inversion:
        | 'not' inversion 
        | comparison
    comparison:
        | bitwise_or compare_op_bitwise_or_pair+ 
        | bitwise_or
    compare_op_bitwise_or_pair:
        | eq_bitwise_or
        # ...
    eq_bitwise_or: '==' bitwise_or
    # ...
    bitwise_or:
        | bitwise_or '|' bitwise_xor 
        | bitwise_xor
    bitwise_xor:
        | bitwise_xor '^' bitwise_and 
        | bitwise_and
    bitwise_and:
        | bitwise_and '&' shift_expr 
        | shift_expr
    

    So, the operand for not must be a comparison, or something down the precedence chain from comparison. And the operands for | must be bitwise_or (bitwise_xor on the right) or something down the precedence chain for those. Since bitwise_or is further down the chain than not, a bitwise_or expression can be the operand of not but a not expression cannot be either of the operands of |.

    So not 0 | 1 means not (0 | 1), because 0 | 1 can be the operand of not while not 0 cannot be an operand of |. And 0 | not 1 is a syntax error because not 1 cannot be an operand of | and there's no other way of parsing the expression.

    Note that this is not the same as C. In C, all of the unary prefix operators bind more tightly than any binary operator, so !0|1 means (!0) | 1, which is 1. That's opposite to the Python expression not 0 | 1, which is False.

    Of course, that's not an explanation for why the Python grammar is written that way, and I'm not in a position to give a complete historic account of the reasoning. Apparently, it was considered desirable that

        not a < b
    

    mean not (a < b), rather than (not a) < b. The latter interpretation would very rarely be desired, so it makes a certain amount of sense. Also, that's consistent with how the other boolean operators work; a < b and b < c does in fact mean what a naïve reader would probably expect. And that's true in C, as well: a < b && b < c doesn't need to be parenthesised to provide the intended parse. (But note that in C, & and | are not in the same place in the precedence list as Python's operators with the same names.)

    So that's all somewhat understandable, but the question is why the grammar is written so as to prohibit unambiguous expressions like 1 | not a, which can only be parsed in one way regardless of precedence. Here, I can only guess.

    Certainly, it is possible to write a grammar which allows unambiguous expressions like that. But it's not easy, if you're limiting yourself to simple BNF (or even the extended BNF variant now used in the Python grammar). The problem is that the cascading precedence style doesn't allow loops; if precedences don't form a consistent partial order, the parser reports ambiguities. On the other hand, if you use a Yacc/Bison-like parser generator, or any of the many operator-precedence parsing techniques you'll find by searching for that phrase, then it's not difficult at all. So the decision to use a parser generator without precedence-based disambiguation is probably related to the implementation.

    The kind of ambiguity you run into with lower precedence unary operators is the following, which people usually run into when they try to write a grammar for languages which include let expressions: "let" <var> "=" <expr> "in" <expr>. In that construct, the second <expr> is greedy: it extends as far as it can be extended. But there's no obvious reason why the let expression itself shouldn't be legal on the right-hand side of an operator:

    z = 3 * let x = 6 in x - 1/6
    

    The let expression evaluates to 29/6 (6 - (1 / 6)), so there's every reason to believe that z will be 14.5, rather than the parser reporting a syntax error. With a naively-written grammar, though, you either get the syntax error or some odd ambiguity report. You get the syntax error when the grammar implements let in the same way that Python implements not, and for the same reason: the let expression cannot be the operand of *, on either side.

    If you try to modify the cascading precedence grammar to allow let on the right-hand side of *, you typically end up with a new ambiguity; when the parser reaches the -, it has the choice of terminating the multiplication ( 3 * let x = 6 in x) - 1/6 or letting the let absorb the rest of the expression, 3 * (let x = 6 in x - 1/6). I don't think most human readers would expect the first parse, although you never know, but a parser doesn't operate with human intuitions. (That's usually a good thing.)

    This is trivial with an operator-precedence parser, because all you need to do is to define let with highest precedence on the left and lowest precedence on the right. The let operator itself stays on the parser stack until the parser is forced to pop it off, because it reaches the end of the expression or a close parenthesis, which effectively "hides" the precedence of the * operator. Thus, everything works as expected.