In looking at the output of this line of code:
mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc=40)
print("MFCC Shape = ", mfccs.shape)
I get a response of MFCC Shape = (40,1876)
. What do these two numbers represent? I looked at the librosa website but still could not decipher what are these two values.
Any insights will be greatly appreciated!
The first dimension (40) is the number of MFCC coefficients, and the second dimensions (1876) is the number of time frames. The number of MFCC is specified by n_mfcc
, and the number of time frames is given by the length of the audio (in samples) divided by the hop_length
.
To understand the meaning of the MFCCs themselves, you should understand the steps it takes to compute them:
A good written explainer is Haytham Fayek: Speech Processing for Machine Learning: Filter banks, Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between and a good video explainer is The Sound of AI: Mel-Frequency Cepstral Coefficients Explained Easily.