In Python, if a 'not' operator follows a bitwise operator (such as '&' or '|') the result is a syntax error. Granted that it will be a bitwise operation on a binary value, but that should be OK. There is no issue in C as far as I recall.
For example, this works:
a = 0
b = 1
anot = not(a)
bnot = not(b)
c = anot | bnot
but this fails:
c = not(a) | not(b)
these work:
c = not(a) | (not(b))
c = not a | (not b)
Can anyone give me insight as to why this should be? Not looking for workarounds, just an explanation of the implementation. In the meantime, I will struggle through the source code and CFGs to see if I can learn more. So far, I have not found any similar question on the Stacks or other Googles. Thanks!
The Python grammar does clearly indicate what's going on: (I edited out the long list of different comparison operators, which are all the same except for the non-terminal name and the operator itself)
inversion:
| 'not' inversion
| comparison
comparison:
| bitwise_or compare_op_bitwise_or_pair+
| bitwise_or
compare_op_bitwise_or_pair:
| eq_bitwise_or
# ...
eq_bitwise_or: '==' bitwise_or
# ...
bitwise_or:
| bitwise_or '|' bitwise_xor
| bitwise_xor
bitwise_xor:
| bitwise_xor '^' bitwise_and
| bitwise_and
bitwise_and:
| bitwise_and '&' shift_expr
| shift_expr
So, the operand for not
must be a comparison
, or something down the precedence chain from comparison
. And the operands for |
must be bitwise_or
(bitwise_xor
on the right) or something down the precedence chain for those. Since bitwise_or
is further down the chain than not
, a bitwise_or
expression can be the operand of not
but a not
expression cannot be either of the operands of |
.
So not 0 | 1
means not (0 | 1)
, because 0 | 1
can be the operand of not
while not 0
cannot be an operand of |
. And 0 | not 1
is a syntax error because not 1
cannot be an operand of |
and there's no other way of parsing the expression.
Note that this is not the same as C. In C, all of the unary prefix operators bind more tightly than any binary operator, so !0|1
means (!0) | 1
, which is 1. That's opposite to the Python expression not 0 | 1
, which is False
.
Of course, that's not an explanation for why the Python grammar is written that way, and I'm not in a position to give a complete historic account of the reasoning. Apparently, it was considered desirable that
not a < b
mean not (a < b)
, rather than (not a) < b
. The latter interpretation would very rarely be desired, so it makes a certain amount of sense. Also, that's consistent with how the other boolean operators work; a < b and b < c
does in fact mean what a naïve reader would probably expect. And that's true in C, as well: a < b && b < c
doesn't need to be parenthesised to provide the intended parse. (But note that in C, &
and |
are not in the same place in the precedence list as Python's operators with the same names.)
So that's all somewhat understandable, but the question is why the grammar is written so as to prohibit unambiguous expressions like 1 | not a
, which can only be parsed in one way regardless of precedence. Here, I can only guess.
Certainly, it is possible to write a grammar which allows unambiguous expressions like that. But it's not easy, if you're limiting yourself to simple BNF (or even the extended BNF variant now used in the Python grammar). The problem is that the cascading precedence style doesn't allow loops; if precedences don't form a consistent partial order, the parser reports ambiguities. On the other hand, if you use a Yacc/Bison-like parser generator, or any of the many operator-precedence parsing techniques you'll find by searching for that phrase, then it's not difficult at all. So the decision to use a parser generator without precedence-based disambiguation is probably related to the implementation.
The kind of ambiguity you run into with lower precedence unary operators is the following, which people usually run into when they try to write a grammar for languages which include let
expressions: "let" <var> "=" <expr> "in" <expr>
. In that construct, the second <expr>
is greedy: it extends as far as it can be extended. But there's no obvious reason why the let
expression itself shouldn't be legal on the right-hand side of an operator:
z = 3 * let x = 6 in x - 1/6
The let
expression evaluates to 29/6 (6 - (1 / 6))
, so there's every reason to believe that z
will be 14.5, rather than the parser reporting a syntax error. With a naively-written grammar, though, you either get the syntax error or some odd ambiguity report. You get the syntax error when the grammar implements let
in the same way that Python implements not
, and for the same reason: the let
expression cannot be the operand of *
, on either side.
If you try to modify the cascading precedence grammar to allow let
on the right-hand side of *
, you typically end up with a new ambiguity; when the parser reaches the -
, it has the choice of terminating the multiplication ( 3 * let x = 6 in x) - 1/6
or letting the let
absorb the rest of the expression, 3 * (let x = 6 in x - 1/6)
. I don't think most human readers would expect the first parse, although you never know, but a parser doesn't operate with human intuitions. (That's usually a good thing.)
This is trivial with an operator-precedence parser, because all you need to do is to define let
with highest precedence on the left and lowest precedence on the right. The let
operator itself stays on the parser stack until the parser is forced to pop it off, because it reaches the end of the expression or a close parenthesis, which effectively "hides" the precedence of the *
operator. Thus, everything works as expected.