pythonaudiolibrosamfcc

Get timing information from MFCC generated with librosa.feature.mfcc


I am extracting MFCCs from an audio file using Librosa's function (librosa.feature.mfcc) and I correctly get back a numpy array with the shape I was expecting: 13 MFCCs values for the entire length of the audio file which is 1292 windows (in 30 seconds).

What is missing is timing information for each window: for example I want to know what the MFCC looks like at time 5000ms, then at 5200ms etc. Do I have to manually calculate the time? Is there a way to automatically get the exact time for each window?

:


Solution

  • The "timing information" is not directly available, as it depends on sampling rate. In order to provide such information, librosa would have create its own classes. This would rather pollute the interface and make it much less interoperable. In the current implementation, feature.mfcc returns you numpy.ndarray, meaning you can easily integrate this code anywhere in Python.

    To relate MFCC to timing:

    import librosa
    import numpy as np
    
    filename = librosa.util.example_audio_file()
    y, sr = librosa.load(filename)
    
    hop_length = 512 # number of samples between successive frames
    mfcc = librosa.feature.mfcc(y=y, n_mfcc=13, sr=sr, hop_length=hop_length)
    
    audio_length = len(y) / sr # in seconds
    step = hop_length / sr # in seconds
    intervals_s = np.arange(start=0, stop=audio_length, step=step)
    
    print(f'MFCC shape: {mfcc.shape}')
    print(f'intervals_s shape: {intervals_s.shape}')
    print(f'First 5 intervals: {intervals_s[:5]} second')
    

    Note that array length of mfcc and intervals_s is the same - a sanity check that we did not make a mistake in our calculation.

    MFCC shape: (13, 2647) 
    intervals_s shape: (2647,)
    First 5 intervals: [0.         0.02321995 0.04643991 0.06965986 0.09287982] second