I am trying to understand x-ray diffraction a little better by coding it up in python. For a collection of points with positions R_i, the Debye formula goes
where the i in the exponential is for the complex number, all other i's are for indices and for now b_i = b_j = 1
, for simplicity.,
Now I tried explicitly calculating this sum for a collection of points of which I have the coordinates
import numpy as np
# set up grid
dims = 2
side = 30
points = np.power(side, dims)
coords = np.zeros((dims, points))
xc, yc = np.meshgrid(np.arange(side), np.arange(side))
coords[0, :] = xc.reshape((points))
coords[1, :] = yc.reshape((points))
# calculate diffraction
xdist = np.subtract.outer(coords[0], coords[0])
ydist = np.subtract.outer(coords[1], coords[1])
rdist = np.stack((xdist, ydist))
rdist = rdist.reshape(2, rdist.shape[1]*rdist.shape[2])
qs = 200
qspace = np.stack((np.linspace(-2, 8, qs), np.zeros(qs)))
diffrac = np.sum(np.exp(-1j * np.tensordot(qspace.T, rdist, axes=1)), axis=1)
Which gave me the following after a couple seconds
This looks as expected (periodicity of 2 pi, as the dots have spacing 1). It also makes sense that this takes some time: for 900 points, 810000 distances have to be calculated. I don't use loops so I think the code is not that bad in terms of efficiency, but just the fact that I'm calculating this sum manually seems inherently slow.
Now it looks as if things would speed up greatly if I could use a discrete fast fourier transform for this - given the shape of the sum. However:
Is there a way to use FFT (or another method) to calculate the sum faster, starting from a list of np.array coordinates (x,y)? (of dirac delta functions, if you want so...).
Specifically pointers at relevant mathematical techniques/python functions/python packages would be appreciated. I'm not that familiar using using fourier transforms for actual applications, but most of the material I find online seems irrelevant. So probably I'm looking in the wrong direction, or something's lacking in my understanding. All help is appreciated!
(the first image is a screenshot from https://www.ill.eu/fileadmin/user_upload/ILL/6_Careers/1_All_our_vacancies/PhD_recruitment/Student_Seminars/2017/19-2017-05-09_Fischer_Cookies.pdf, as it seems there's no math notation on SO or I did not find it))
This answer provides a solution to make the code more efficient so it fully uses the computing power of your CPU and so to make it significantly faster.
More than 90% of the time is spent in np.exp
because computing the experiential of complex numbers is very expensive.
One solution to speed this up is to use multiple threads (since Numpy does not use multiple threads). On top of that, we can also use a faster implementation of np.exp
(typically leveraging SIMD units of the CPU). Both can be done easily with Numexpr.
Then, we can speed up the np.tensordot
operation using the matrix multiplication qspace.T @ rdist
since the Numpy implementation is not efficient.
import numexpr as ne
# Equivalent of the last line of the code:
tmp1 = qspace.T @ rdist
tmp2 = ne.evaluate('exp(-1j * tmp1)')
diffrac = np.sum(tmp2, axis=1)
Here are performance results on my i5-9600KF CPU (6 cores):
Initial code: 9.3 s
New proposed code: 1.1 s
Thus, the new implementation is 8.5 times faster. Most of the time is still spent in computing the exponential of complex numbers (>60%).