I am using the following code to digitize an array into 16 bins:
numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])
I expect that the output is in the range [1, 16], since there are 16 bins. However, one of the values in the returned array is 17. How can this be explained?
This is actually documented behaviour of numpy.digitize()
:
Each index
i
returned is such thatbins[i-1] <= x < bins[i]
ifbins
is monotonically increasing, orbins[i-1] > x >= bins[i]
ifbins
is monotonically decreasing. If values inx
are beyond the bounds ofbins
,0
orlen(bins)
is returned as appropriate.
So in your case, 0
and 17
are also valid return values (note that the bin array returned by numpy.histogram()
has length 17
). The bins returned by numpy.histogram()
cover the range array.min()
to array.max()
. The condition given in the docs shows that array.min()
belongs to the first bin, while array.max()
lies outside the last bin -- that's why 0
is not in the output, while 17 is.