pythonaudiofeature-extractionlibrosamfcc

What is the warning 'Empty filters detected in mel frequency basis. ' about?


I'm trying to extract MFCC features from an audio file with 13 MFCCs with the below code:

import librosa as l

x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
n_fft = int(sr * 0.02)   
hop_length = n_fft // 2  
mfccs = l.feature.mfcc(x, sr=sr, n_mfcc=13, hop_length=hop_length,  n_fft=n_fft)

But it is showing this warning. What does that mean and how do I get rid of it?

UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
  warnings.warn('Empty filters detected in mel frequency basis. '

Solution

  • MFCCs are based on mel-spectrograms, which in turn are usually based on the discrete Fourier transform (DFT). The Fourier transform takes a signal from the time domain and converts it into the frequency domain. This means that N time domain samples are converted into N frequency domain values (note the symmetry—you actually only have N/2 frequency values). Just like the time domain samples are on a linear time scale, the frequency domain samples are on a linear frequency scale. In contrast, the mel-scale is not linear, but (approximately) logarithmic.

    You need to know the following about the Fourier transform. When you have a signal with F_s = 8000Hz and a window length of N:

    Now consider how MFCCs are computed (see also here):

    1. Take the Fourier transform of (a windowed excerpt of) a signal.
    2. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.
    3. Take the logs of the powers at each of the mel frequencies.
    4. Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
    5. The MFCCs are the amplitudes of the resulting spectrum.

    In step 2 you have to map whatever your DFT produced to a different scale, the mel-scale. If the DFT resolution Δf is too low to map power values to the (potentially) finer mel-scale, this does not work. Think of it like an image: When you have a coarse image, you cannot increase quality by mapping it to a higher resolution. This means, you have to ensure that your DFT resolution Δf is fine enough for the mel bands you want to use.

    To ensure this, you have to either use a longer window N or fewer mel bands n_mfcc. The problem at the heart of this is, that you cannot have both: high frequency resolution and at the same time high temporal resolution.

    See also IRCAM Intro on FFT parameters.