I am using the below code to calculate standard deviation azimuthally for 2D array converted from images. However, the code takes couple minutes to run. Can anyone suggest how this can be made faster?
import numpy as np
def radial_profile(data, center):
y, x = np.indices((data.shape))
r = np.sqrt((x - center[0])**2 + (y - center[1])**2)
r = r.astype(np.int64)
radialprofile = np.zeros(np.max(r) + 1)
for i in range (np.max(r) + 1):
radialprofile[i] = np.std(data[r == i])
return radialprofile
data = random.randint(10, size=(4096, 4096))
std_azi = radial_profile(data, [2048,2048])
CPU times: total: 1min 1s
Wall time: 1min 1s
As you can see in the CPU and Wall time, it takes at least a minute to run this. I have 10,000 such astronomical images that needs to be processed. So, any suggestion on how this code can be made faster would be highly appreciated.
The main issue is that data[r == i]
is done >2000 times. This is very inefficient especially since data
needs is quite big.
One could use a group-by strategy so to do this all at once. That being said, Numpy does not support this. AFAIK, there are packages to do that but we can design a more efficient implementation based on the specific properties of this code (e.g. the group indices are relatively small integers). We can implement this in Numba. We need to do a two pass implementation for sake of accuracy.
Here is the resulting code:
import numpy as np
import numba as nb
@nb.njit('(int32[:,::1], UniTuple(int32,2))')
def radial_profile(data, center):
dh, dw = data.shape
cx, cy = center
rMax = np.int64(np.sqrt(max(cx**2 + cy**2, cx**2+(dh-cy)**2, (dw-cx)**2+cy**2, (dw-cx)**2+(dh-cy)**2)))
# Step 1: computes the mean by group
sumByGroup = np.zeros(rMax + 1)
groupSize = np.zeros(rMax + 1, dtype=np.int64)
for y in range(dh):
for x in range(dw):
r = np.int64(np.sqrt((x - cx)**2 + (y - cy)**2))
sumByGroup[r] += data[y, x]
groupSize[r] += 1
meanByGroup = sumByGroup / groupSize
# Step 2: computes the variance by group based on the mean by group
sqSumByGroup = np.zeros(rMax + 1)
for y in range(dh):
for x in range(dw):
r = np.int64(np.sqrt((x - cx)**2 + (y - cy)**2))
sqSumByGroup[r] += (data[y, x] - meanByGroup[r])**2
varianceByGroup = sqSumByGroup / groupSize
return np.sqrt(varianceByGroup)
std_azi = radial_profile(data, (2048,2048))
Here is results on my machine with a i5-9600KF processor:
Initial code: 33154 ms
New code: 63 ms
The new code is 526 times faster than the initial one.
Note that this code is sequential. Each image can be computed in parallel. The resulting parallel code will scale very well since this code is clearly compute-bound.