Calling librosa.feature.mfcc() on an audio file spits out a 2D array like so:
array([[ -5.229e+02, -4.944e+02, ..., -5.229e+02, -5.229e+02],
[ 7.105e-15, 3.787e+01, ..., -7.105e-15, -7.105e-15],
...,
[ 1.066e-14, -7.500e+00, ..., 1.421e-14, 1.421e-14],
[ 3.109e-14, -5.058e+00, ..., 2.931e-14, 2.931e-14]])
My question is what are these? Because I was expecting a 1D array of coefficients, why is it 2D? and what are the dimensions? Maybe this is my misunderstanding of what I should be getting back, however any explanation would be appreciated. I tried looking online but everyone seems to just know what it is.
One dimension is the time, the other one are the different frequencies. This link shows how it looks if you plot it: