fftcufftscikit-cuda

How to interpret cuFFT R2C result


I'm in the process of accelerating some data analysis code with GPU and am currently doing some profiling and comparisons between the numpy.fft library and cuFFT (using the skcuda.fft wrapper).

I'm certain I'm just missing something obvious about the FFT implementation in cuFFT, but I'm struggling to find what it is in the cuFFT documentation.

To setup the problem I create 500 ms of data sampled at 100 MS/s with a few spectral components. Then, I declare the GPU arrays, the cufft plan (R2C) and run the fft with a subset of the data. Finally, I have a comparison with numpy.fft.rfft:

time = np.linspace(0, 500E-3, int(50E6))
freq = 100E6

data = np.sin(2*np.pi*time*1E6)
data += np.sin(2*np.pi*time*2E6 + 0.5)
data += np.sin(2*np.pi*time*3E6 - 0.1)
data += np.sin(2*np.pi*time*4E6 - 0.9)
data += np.sin(2*np.pi*time*1E6 - 1.9)
data += np.sin(2*np.pi*time*15E6 - 2.1)
data += np.sin(2*np.pi*time*20E6 - 0.3)
data += np.sin(2*np.pi*time*25E6 - 0.3)

nPtsFFT = int(2**13)

dDev = gp.GPUArray(nPtsFFT, np.float64)
dDev.set(data[:nPtsFFT])

rDev = gp.GPUArray(int(nPtsFFT/2+1), np.float64)

plan = cufft.Plan(nPtsFFT, np.float64, np.complex128)
cufft.fft(dDev, rDev, plan)
rHost = rDev.get()

freqs = np.fft.rfftfreq(nPtsFFT, 1/freq)
hfftRes = np.fft.rfft(data[:nPtsFFT])

plt.loglog(freqs, np.abs(hfftRes), label='npfft')
plt.loglog(freqs, np.abs(rHost), label='cufft')
plt.legend()
plt.show()

I naively assumed that these would be approximately equal, but what I find is that the cufft peaks are all shifted and every other point is lower than expected.

Naive comparison of cuFFT and npfft

This reminded me of the output of scipy.fftpack.rfft, so checking the documentation there I found the interleaving of Re and Im parts. So, if I modify the plotting to:

plt.loglog(freqs, np.abs(hfftRes), label='npfft')
plt.loglog(freqs[:-1:2]/2, np.abs(rHost[:-1:2] + 1j*rHost[1::2]), label='cufft')
plt.legend()
plt.show()

I now get what I expect but only up to 25 MHz, when I should be able to get data up to 50 MHz given the sampling rate.

Comparison compensating for interleave Re and Im output

Is there a way to extract the data up to the Nyquist frequency from this transform?


Solution

  • Since the R2C interface produces complex outputs you have to provide an array of np.complex128 types to get the entire int(nPtsFFT/2+1) complex values, not just int(nPtsFFT/2+1) floating point values (which would corresponds to only half the quantity of data).

    This can be done by changing the rDev definition as follows (and keeping everything else the same):

    rDev = gp.GPUArray(int(nPtsFFT/2+1), np.complex128)
    
    plan = cufft.Plan(nPtsFFT, np.float64, np.complex128)
    cufft.fft(dDev, rDev, plan)
    rHost = rDev.get()
    
    freqs = np.fft.rfftfreq(nPtsFFT, 1/freq)
    hfftRes = np.fft.rfft(data[:nPtsFFT])
    
    plt.loglog(freqs, np.abs(hfftRes), label='npfft')
    plt.loglog(freqs, np.abs(rHost), label='cufft')
    plt.legend()
    plt.show()
    

    The result should then go all the way up to 50MHz Nyquist frequency as expected, with the spikes lining up nicely with the reference np.fft.rfft implementation.