I just had a bug which was based on np.sum
and an equivalent (or at least I thought so...) np.einsum
command not giving the same result. Here is an example:
import numpy.random
array = np.random.randint(-10000, 10000, size=(4, 100, 200, 600), dtype=np.int16)
sum1 = np.sum(array, axis=(0,1,2))
sum2 = np.einsum('aijt->t', array)
print(np.allclose(sum1, sum2))
plt.figure()
plt.plot(sum1)
plt.plot(sum2)
plt.show()
After some searching, this is due to overflow of the integer data type.
My question:
np.einsum
not giving the same result as np.sum
here? I feel the np.sum behaviour is a lot more desirable leading to less errors.np.einsum
not throw an overflow error or at least a warning? This is super scary in terms of getting hidden bugs when using it. Should I be checking for those by hand every time I use the command?Define a large int16
:
In [322]: y=np.int16(32000)
Addition produces a warning:
In [323]: y+y
C:\Users\paul\AppData\Local\Temp\ipykernel_8828\1714217578.py:1: RuntimeWarning: overflow encountered in short_scalars
y+y
Out[323]: -1536
sum
promotes them to a larger int, and no warning:
In [324]: np.sum((y,y))
Out[324]: 64000
In [325]: _.dtype
Out[325]: dtype('int32')
Make an array from that:
In [326]: Y = np.array(y)
Overflow without warning:
In [327]: Y+Y
Out[327]: -1536
I don't recall the details, but it's been explained that checking each element of an array for overflow is/was considered to be too expensive.
Rather than checking 'by hand', just be aware of the overflow possibility, and don't use smaller dtypes unnecessarily.
A possible duplicate