I am seeing behaviour with numpy bincount that I cannot make sense of. I want to bin the values in a 2D array in a row-wise manner and see the behaviour below. Why would it work with dbArray but fail with simarray?
>>> dbArray
array([[1, 0, 1, 0, 1],
[1, 1, 1, 1, 1],
[1, 1, 0, 1, 1],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 1, 0]])
>>> N.apply_along_axis(N.bincount,1,dbArray)
array([[2, 3],
[0, 5],
[1, 4],
[4, 1],
[3, 2],
[3, 2]], dtype=int64)
>>> simarray
array([[2, 0, 2, 0, 2],
[2, 1, 2, 1, 2],
[2, 1, 1, 1, 2],
[2, 0, 1, 0, 1],
[1, 0, 1, 1, 2],
[1, 1, 1, 1, 1]])
>>> N.apply_along_axis(N.bincount,1,simarray)
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
N.apply_along_axis(N.bincount,1,simarray)
File "C:\Python27\lib\site-packages\numpy\lib\shape_base.py", line 118, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (3)
The problem is that bincount
isn't always returning the same shaped objects, in particular when values are missing. For example:
>>> m = np.array([[0,0,1],[1,1,0],[1,1,1]])
>>> np.apply_along_axis(np.bincount, 1, m)
array([[2, 1],
[1, 2],
[0, 3]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([2, 1]), array([1, 2]), array([0, 3])]
works, but:
>>> m = np.array([[0,0,0],[1,1,0],[1,1,0]])
>>> m
array([[0, 0, 0],
[1, 1, 0],
[1, 1, 0]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([3]), array([1, 2]), array([1, 2])]
>>> np.apply_along_axis(np.bincount, 1, m)
Traceback (most recent call last):
File "<ipython-input-49-72e06e26a718>", line 1, in <module>
np.apply_along_axis(np.bincount, 1, m)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (1)
won't.
You could use the minlength
parameter and pass it using a lambda
or partial
or something:
>>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m)
array([[3, 0],
[1, 2],
[1, 2]])