Calculate exponential complex sum with fft instead of summation to simulate diffraction?

Context

I am trying to understand x-ray diffraction a little better by coding it up in python. For a collection of points with positions R_i, the Debye formula goes

where the i in the exponential is for the complex number, all other i's are for indices and for now b_i = b_j = 1, for simplicity.,

Now I tried explicitly calculating this sum for a collection of points of which I have the coordinates

import numpy as np
# set up grid
dims = 2
side = 30
points = np.power(side, dims)
coords = np.zeros((dims, points)) 

xc, yc = np.meshgrid(np.arange(side), np.arange(side))
coords[0, :] = xc.reshape((points))
coords[1, :] = yc.reshape((points))


# calculate diffraction
xdist = np.subtract.outer(coords[0], coords[0])
ydist = np.subtract.outer(coords[1], coords[1])
rdist = np.stack((xdist, ydist))
rdist = rdist.reshape(2, rdist.shape[1]*rdist.shape[2])

qs = 200
qspace = np.stack((np.linspace(-2, 8, qs), np.zeros(qs)))
diffrac = np.sum(np.exp(-1j * np.tensordot(qspace.T, rdist, axes=1)), axis=1)

Which gave me the following after a couple seconds

This looks as expected (periodicity of 2 pi, as the dots have spacing 1). It also makes sense that this takes some time: for 900 points, 810000 distances have to be calculated. I don't use loops so I think the code is not that bad in terms of efficiency, but just the fact that I'm calculating this sum manually seems inherently slow.

Thoughts

Now it looks as if things would speed up greatly if I could use a discrete fast fourier transform for this - given the shape of the sum. However:

for the discrete fourier transform, I would still need to 'pixelate' the image (as far as I understand), to include a lot of empty space in between the points in my signal. As if I were to transform the pixels of the first image I shared. This also seems less efficient (e.g. because of the sampling).
I would like to move points around afterwards, so the fact that the first image is a grid and thus sampled regularly is not particularly helpful. It looks as if non-uniform fourier transformations could help me, but still that would require me to 'pixelate' the image and set some values to 0.

Question

Is there a way to use FFT (or another method) to calculate the sum faster, starting from a list of np.array coordinates (x,y)? (of dirac delta functions, if you want so...).

Specifically pointers at relevant mathematical techniques/python functions/python packages would be appreciated. I'm not that familiar using using fourier transforms for actual applications, but most of the material I find online seems irrelevant. So probably I'm looking in the wrong direction, or something's lacking in my understanding. All help is appreciated!

(the first image is a screenshot from https://www.ill.eu/fileadmin/user_upload/ILL/6_Careers/1_All_our_vacancies/PhD_recruitment/Student_Seminars/2017/19-2017-05-09_Fischer_Cookies.pdf, as it seems there's no math notation on SO or I did not find it))

Solution

This answer provides a solution to make the code more efficient so it fully uses the computing power of your CPU and so to make it significantly faster.

More than 90% of the time is spent in np.exp because computing the experiential of complex numbers is very expensive.

One solution to speed this up is to use multiple threads (since Numpy does not use multiple threads). On top of that, we can also use a faster implementation of np.exp (typically leveraging SIMD units of the CPU). Both can be done easily with Numexpr.

Then, we can speed up the np.tensordot operation using the matrix multiplication qspace.T @ rdist since the Numpy implementation is not efficient.

import numexpr as ne

# Equivalent of the last line of the code:
tmp1 = qspace.T @ rdist
tmp2 = ne.evaluate('exp(-1j * tmp1)')
diffrac = np.sum(tmp2, axis=1)

Performance evaluation

Here are performance results on my i5-9600KF CPU (6 cores):

Initial code:         9.3 s
New proposed code:    1.1 s

Thus, the new implementation is 8.5 times faster. Most of the time is still spent in computing the exponential of complex numbers (>60%).