I just realized that CPython seems to treat constant expressions, which represent the same value, differently with respect to constant folding. For example:
>>> import dis
>>> dis.dis('2**66')
1 0 LOAD_CONST 0 (2)
2 LOAD_CONST 1 (66)
4 BINARY_POWER
6 RETURN_VALUE
>>> dis.dis('4**33')
1 0 LOAD_CONST 2 (73786976294838206464)
2 RETURN_VALUE
For the second example constant folding is applied while for the first it is not though both represent the same value. It doesn't seem to be related to the value of the exponent nor to the magnitude of the result since the following expressions are folded as well:
>>> dis.dis('2.0**66')
1 0 LOAD_CONST 2 (7.378697629483821e+19)
2 RETURN_VALUE
>>> dis.dis('4**42')
1 0 LOAD_CONST 2 (19342813113834066795298816)
2 RETURN_VALUE
Why are the first two expressions treated differently and, more generally, what are the specific rules that CPython follows for constant folding?
Tested on:
$ python3.6 --version
Python 3.6.5 :: Anaconda, Inc.
$ python3.7 --version
Python 3.7.1
There are no rules for constant folding. There are only implementation details. They have changed before, and they will change again.
Heck, you can't even talk about the "Python 3 behavior", or the "Python 3.6 behavior", because these implementation details changed between 3.6.4 and 3.6.5. On 3.6.4, the 2**66
example gets constant-folded.
For now, and no one knows how long "for now" will last, the implementation details are that the AST optimizer includes safeguards to prevent spending too much time or memory on constant folding. The safeguard for 2**66
or 4**33
is based on the number of bits in the LHS and the value of the RHS:
if (PyLong_Check(v) && PyLong_Check(w) && Py_SIZE(v) && Py_SIZE(w) > 0) {
size_t vbits = _PyLong_NumBits(v);
size_t wbits = PyLong_AsSize_t(w);
if (vbits == (size_t)-1 || wbits == (size_t)-1) {
return NULL;
}
if (vbits > MAX_INT_SIZE / wbits) {
return NULL;
}
}
MAX_INT_SIZE
is #define
d earlier as 128. Since 2
is a 2-bit number and 4
is a 3-bit number, the estimated result size is smaller for 4**33
, so it passes the check and gets constant-folded.
On Python 3.6.5, the implementation details are mostly similar, but this constant folding happens in the bytecode peephole optimizer instead of the AST optimizer, which doesn't exist on 3.6.5.
On Python 3.6.4, the pre-check safeguard doesn't exist. The peephole optimizer will discard too-large constant-folding results after computing them, which results in different thresholds than the pre-checks.