listaudiolibrosamfcc

Match MFCC to video frames


I extracted video frames and mfcc from a video. I got (524, 64, 64) video frames and a shape of (80, 525) mfcc. The number of frames the data match but the dimensions are inversed. How can I make align the mfcc to be in the size (525, 80).

And by permuting the dimensions, will it distort the audio information?


Solution

  • Swapping the dimensions of a multidimensional array does not alter the values at all, only their locations.

    To swap such that the time-axis is the first in your MFCC, use the .T (for transpose) numpy attribute.

    mfcc_timefirst = mfcc.T