pythonnumpy

Is there a pure numpy way of performing an operation such as np.min over variable-size bins?


Let's say I have a 1-d array of size n, where each index is associated to a certain index in increasing order from 0 to m-1 with m < n, like this for instance:

[a1, a2, a3, a4, ..., aN]  # my array
[0,  1,  1,  1,  ..., m-1] # associated indices

You can think of it as the array being divided into m bins. Importantly, they are not necessarily all the same size. The question is how to get the array of size m containing the minimum of each bin. Everything I thought of involves either a non-numpy loop or list, or adding dummy data (presumably resulting in useless traversals, in practice with my actual data this results in lots of extra dummy data):

  1. Simply loop over the bins:
mins = np.zeros(m)
for i, (begin, end) in enumerate(bins):
    mins[i] = np.min(myarray[begin:end])
  1. Expand the array with dummy data so all bins are the same size, and I can reshape it to an extra dimension. Let's say I have prepared a mask of size (m * bin_size) that is True for non-dummy dimensions, and a fake_array of the same shape initialized to a big constant:
fake_array[mask] = myarray
fake_array_2d = fake_array.reshape((m, bin_size))
mins = np.min(fake_array_2d, axis=1)
  1. Do the calculation in a (m, n) matrix (again initialized to a big number):
np.put_along_axis(fake_matrix, indices[None, :], myarray, axis=0) # took me a while to get that one right
mins = np.min(fake_matrix, axis=1)

Is there a better option?


Solution

  • Taking inspiration from the answer posted by jin-pendragon, I found a solution using np.minimum.reduceat. You need an array of length m containing the slice starting points (so the first element is always 0), let's call it start_indices, and then you can do:

    mins = np.minimum.reduceat(myarray, start_indices)