I have a nested array of shape: [200, 500, 1000].
Each index represents a coordinate of an image, eg array[1, 2, 3] would give me the value of the array at x=1, y=2, and z=3 in coordinate space. I have repeating values in the array in the range from 1-20,000. My goal is to find the x, y, and z coordinates of each of these values (they occur multiple times). Normally, I'd just loop over the array and ask at each point
current_values = np.where(arr==current_index)
However, it's slow. So I tried cupy instead (current_values = cp.where(arr==current_index)
), but it sometimes doesn't work at around the same index. For example, current_values = cp.where(arr==current_index)
yields an empty array at current_index==760 on one run, and current_index==780 on another run. Keep in mind I am using the same data, so I am quite puzzled by this behaviour. The amount of errors is small I assume, because this only happens thrice in the loop, but it obviously still impacts my accuracy.
I have two questions:
A completely different solution:
array = np.random.randint(1, 20_001, (200, 500, 1000))
flat_array = array.reshape(-1) # does not copy array
sorted_index = np.argsort(flat_array)
# Suppose you want to find all elements that have value 100
left = bisect.bisect_left(sorted_index, 100, key=lambda x: flat_array[x])
right = bisect.bisect_right(sorted_index, 100, key=lambda x: flat_array[x])
At this point, flat_array[sorted_index[i]] = 100
for all values of i
such that left <= i < right
.
Each sorted_index[i]
can be converted into a 3-dimensional index using:
np.unravel_index(sorted_index[i], array.shape)
So you also that that
array[np.unravel_index(sorted_index[i], array.shape)] = 100
for that same range of indices.
Note: By default np.argsort
uses an unstable sorting algorithm. If it's important that the indices be in increasing order, add kind='stable'
to the sort call.
Changed to use np.unravel_index
as suggested by @hpaulj. Fixed typo where I used temp
instead of sorted_index
.