pythonnumpyfftfrequency-analysisaudio-analysis

How to Make Sense of Fourier Transform Results in Audio Frequency Analysis


I am doing audio analysis in Python. My end goal is to get a list of frequencies and their respective volumes, like { frequency : volume (0.0 - 1.0) }.

I have my audio data as a list of frames with values between -1.0 and +1.0. I used numpy's fourier transform on this list — numpy.fftpack.fft(). But the resulting data makes no sense to me.

I do understand that the fourier transform transforms from the time to the frequency domain, but not quite how it mathematically works. That's why I don't quite understand the results.

Thank you. Sorry if my lack of understanding of the fourier transform made you facepalm.


Solution

  • Consider the FFT of a single period of a sine wave:

    >>> t = np.linspace(0, 2*np.pi, 100)
    >>> x = np.sin(t)
    >>> f = np.fft.rfft(x)
    >>> np.round(np.abs(f), 0)
    array([  0.,  50.,   1.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
             0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
             0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
             0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
             0.,   0.,   0.,   0.,   0.,   0.,   0.])
    

    The FFT returns an array of complex numbers which give the amplitude and phase of the frequencies. Assuming you're only interested in the amplitude, I've used np.abs to get the magnitude for each frequency and rounded it to the nearest integer using np.round(__, 0). You can see the spike at index 1 indicating a sin wave with period equal to the number of samples was found.

    Now make the wave a bit more complex

    >>> x = np.sin(t) + np.sin(3*t) + np.sin(5*t)
    >>> f = np.fft.rfft(x)
    >>> np.round(np.abs(f), 0)
    array([  0.,  50.,   1.,  50.,   0.,  48.,   4.,   2.,   2.,   1.,   1.,
             1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   0.,
             0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
             0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,
             0.,   0.,   0.,   0.,   0.,   0.,   0.])
    

    We now see spikes at indicies 1, 3 & 5 corresponding to our input. Sine waves with periods of n, n/3 and n/5 (where n in the number of input samples).

    EDIT

    Here's a good conceptual explanation of the Fourier transform: http://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/