I have Numpy 3d array that is just a list of gray images:
images = np.zeros((xlen, height, width), dtype=int)
for i in range (5):
images[i] = cv2.imread(filename[i], cv2.IMREAD_GRAYSCALE)
All the images are pretty the same but they all have some random noise pixels. My idea is that noise pixel are max or min values compared to the same pixels in other images.
So i need to:
I implemented that in a naive way using standard python functions but that is tooo slow:
#remove highest and lowest values for each pixel
for el in range (height):
for em in range (width):
mylist = []
for j in range (0, xlen):
mylist.append(images[j][el][em])
indmin = mylist.index(min(mylist))
indmax = mylist.index(max(mylist))
temp_counterx=0
temp_sum = 0
for j in range (0, xlen):
if (j!=indmin) and (j!=indmax):
temp_counterx +=1
temp_sum += mylist[j]
temp_val = int(temp_sum/temp_counterx)
images[indmin][el][em]=temp_val
images[indmax][el][em]=temp_val
Is it possible to speed that up using Numpy?
UPD: Accepted solution proposed by flawr with some minor changes:
mins = np.min(images, axis=0)
maxs = np.max(images, axis=0)
sums = np.sum(images, axis=0)
# compute the mean without the extremes
mean_without_extremes = (sums - mins - maxs) / (xlen - 2)
mean_without_extremes = mean_without_extremes.astype(int)
# replace maxima with the mean
images = np.where((mins==images), images, mean_without_extremes)
images = np.where((maxs==images), images, mean_without_extremes)
...and got 30-fold speed increase! It seems that numpy provides really fast and powerful computing engine, just usage could be tricky sometimes due to the complex data structure it's dealing.
First of, to compute things like a mean you probably want to use floating point numbers instead of integers to start width. So in the following I assume you use those instead.
By using python-loops you're giving away all the advantages of numpy, because they are inherently slow, at least compared to the underlying compiled code that is executed when calling numpy functions. If you want your code to be reasonably fast you should make use of vectorization. Consider following code that does what you ask for but without any loops in python:
# compute minima, maxima and sum
mins = np.min(images, axis=0)
maxs = np.max(images, axis=0)
sums = np.sum(images, axis=0)
# compute the mean without the extremes
mean_without_extremes = (sums - mins - maxs) / (xlen - 2)
# replace maxima with the mean
images[images == mins] = mean_without_extremes.reshape(-1)
images[images == maxs] = mean_without_extremes.reshape(-1)
As you're probably not familiar with that, I recommend reading the introdcutions in the documentation about indexing and broadcasting in order to use numpy efficiently:
EDIT: As pointed out in the comments, the solution above only works for xlen > 2
and if the extrema are only attained once per pixel location. this could be fixed by replacing these lines with
images = np.where(images == mins, images, mean_without_extremes)
images[np.isnan(images)] = 0 # set "empty mean" to zero
# using "np.where" as suggested by OP
# we can actually reduce that to one "np.where" call which might be slightly faster
images = np.where(np.logical_or(images == mins, images == maxs), images, mean_without_extremes)