This Python 3.12.7 script with numpy 2.2.4:
import numpy as np
a = np.random.randint(0, 256, (500, 500)).astype(np.uint8)
counts, bins = np.histogram(a, range(0, 255, 25))
print(np.column_stack((counts, bins[:-1], bins[1:])))
counts, bins = np.histogram(a, range(0, 257, 16))
print(np.column_stack((counts, bins[:-1], bins[1:])))
produces this kind of output:
[[24721 0 25]
[24287 25 50]
[24413 50 75]
[24441 75 100]
[24664 100 125]
[24390 125 150]
[24488 150 175]
[24355 175 200]
[24167 200 225]
[25282 225 250]]
[[15800 0 16]
[15691 16 32]
[15640 32 48]
[15514 48 64]
[15732 64 80]
[15506 80 96]
[15823 96 112]
[15724 112 128]
[15629 128 144]
[15681 144 160]
[15661 160 176]
[15558 176 192]
[15526 192 208]
[15469 208 224]
[15772 224 240]
[15274 240 256]]
where the first histogram always has the highest count in bin [225, 250)
. The second histogram indicates a uniform distribution, as expected. I tried a dozen of times and the anomaly was always there. Can someone explain this behavior?
I think the docs explain pretty well what's happening, but are spread out in two different places. First, the range range(0, 255, 25)
is supplying the bins
parameter, not the range
parameter. Secondly, the Notes section states:
All but the last (righthand-most) bin is half-open. In other words, if bins is:
[1, 2, 3, 4]
then the first bin is
[1,2)
(including 1, but excluding 2) and the second[2,3)
. The last bin, however, is[3,4]
, which includes 4.
Pretty sure the extra counts in your case are the number of elements that equal 250. This makes sense, since the increase is about 1/25th of the bin size compared to the other bins, which all have a width of 25.