pythonfftfrequencyspectrumaudacity

Normalizing FFT spectrum magnitude to 0dB


I'm using FFT to extract the amplitude of each frequency components from an audio file. Actually, there is already a function called Plot Spectrum in Audacity that can help to solve the problem. Taking this example audio file which is composed of 3kHz sine and 6kHz sine, the spectrum result is like the following picture. You can see peaks are at 3KHz and 6kHz, no extra frequency.

enter image description here

Now I need to implement the same function and plot the similar result in Python. I'm close to the Audacity result with the help of rfft but I still have problems to solve after getting this result.

enter image description here

  1. What's physical meaning of the amplitude in the second picture?
  2. How to normalize the amplitude to 0dB like the one in Audacity?
  3. Why do the frequency over 6kHz have such high amplitude (≥90)? Can I scale those frequency to relative low level?

Related code:

import numpy as np
from pylab import plot, show
from scipy.io import wavfile

sample_rate, x = wavfile.read('sine3k6k.wav')
fs = 44100.0

rfft = np.abs(np.fft.rfft(x))
p = 20*np.log10(rfft)
f = np.linspace(0, fs/2, len(p))

plot(f, p)
show()

Update

I multiplied Hanning window with the whole length signal (is that correct?) and get this. Most of the amplitude of skirts are below 40.

enter image description here

And scale the y-axis to decibel as @Mateen Ulhaq said. The result is more close to the Audacity one. Can I treat the amplitude below -90dB so low that it can be ignored?

Updated code:

fs, x = wavfile.read('input/sine3k6k.wav')
x = x * np.hanning(len(x))

rfft = np.abs(np.fft.rfft(x))
rfft_max = max(rfft)
p = 20*np.log10(rfft/rfft_max)
f = np.linspace(0, fs/2, len(p))

enter image description here


About the bounty

With the code in the update above, I can measure the frequency components in decibel. The highest possible value will be 0dB. But the method only works for a specific audio file because it uses rfft_max of this audio. I want to measure the frequency components of multiple audio files in one standard rule just like Audacity does.

I also started a discussion in Audacity forum, but I was still not clear how to implement my purpose.


Solution

  • After doing some reverse engineering on Audacity source code here some answers. First, they use Welch algorithm for estimating PSD. In short, it splits signal to overlapped segments, apply some window function, applies FFT and averages the result. Mostly as This helps to get better results when noise is present. Anyway, after extracting the necessary parameters here is the solution that approximates Audacity's spectrogram:

    import numpy as np
    from scipy.io import wavfile
    from scipy import signal
    from matplotlib import pyplot as plt
    
    segment_size = 512
    
    fs, x = wavfile.read('sine3k6k.wav')
    x = x / 32768.0  # scale signal to [-1.0 .. 1.0]
    
    noverlap = segment_size / 2
    f, Pxx = signal.welch(x,                        # signal
                          fs=fs,                    # sample rate
                          nperseg=segment_size,     # segment size
                          window='hanning',         # window type to use
                          nfft=segment_size,        # num. of samples in FFT
                          detrend=False,            # remove DC part
                          scaling='spectrum',       # return power spectrum [V^2]
                          noverlap=noverlap)        # overlap between segments
    
    # set 0 dB to energy of sine wave with maximum amplitude
    ref = (1/np.sqrt(2)**2)   # simply 0.5 ;)
    p = 10 * np.log10(Pxx/ref)
    
    fill_to = -150 * (np.ones_like(p))  # anything below -150dB is irrelevant
    plt.fill_between(f, p, fill_to )
    plt.xlim([f[2], f[-1]])
    plt.ylim([-90, 6])
    # plt.xscale('log')   # uncomment if you want log scale on x-axis
    plt.xlabel('f, Hz')
    plt.ylabel('Power spectrum, dB')
    plt.grid(True)
    plt.show()
    

    Some necessary explanations on parameters:

    enter image description here

    What's physical meaning of the amplitude in the second picture?

    It is basically amount of energy in the frequency bin.

    How to normalize the amplitude to 0dB like the one in Audacity?

    You need choose some reference point. Graphs in decibels are always relevant to something. When you select maximum energy bin as a reference, your 0db point is the maximum energy (obviously). It is acceptable to set as a reference energy of the sine wave with maximum amplitude. See ref variable. Power in sinusoidal signal is simply squared RMS, and to get RMS, you just need to divide amplitude by sqrt(2). So the scaling factor is simply 0.5. Please note that factor before log10 is 10 and not 20, this is because we are dealing with power of signal and not amplitude.

    Can I treat the amplitude below -90dB so low that it can be ignored?

    Yes, anything below -40dB is usually considered negligeble