librosamfcc

What are the components of the Mel mfcc


In looking at the output of this line of code:

mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc=40)
print("MFCC Shape = ", mfccs.shape)

I get a response of MFCC Shape = (40,1876). What do these two numbers represent? I looked at the librosa website but still could not decipher what are these two values.

Any insights will be greatly appreciated!


Solution

  • The first dimension (40) is the number of MFCC coefficients, and the second dimensions (1876) is the number of time frames. The number of MFCC is specified by n_mfcc, and the number of time frames is given by the length of the audio (in samples) divided by the hop_length.

    To understand the meaning of the MFCCs themselves, you should understand the steps it takes to compute them:

    A good written explainer is Haytham Fayek: Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between and a good video explainer is The Sound of AI: Mel-Frequency Cepstral Coefficients Explained Easily.