I'm trying to optimize the performance of a script, which is full of Numpy's where() after which only the first returned element is actually used. Example:
F = np.where(Y>p/100)[0]
For the huge data sets that we are processing, it doesn't look like a good solution (both in terms of speed and memory consumption) to create a large array and then discard all but the first element. Is there any way how to skip the overhead, maybe by tweaking the condition?
You can use argmax in cases where you want the first item. It returns the index of that item.
idx = np.argmax(Y > p/100)
if Y[idx] > p/100:
F = idx
else:
F = None