On using a general code in order to extract scaled MFCC data:
def extract_features(file_name):
try:
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
mfccsscaled = np.mean(mfccs.T,axis=0)
except Exception as e:
print("Error encountered while parsing file: ", file)
return None
return mfccsscaled
Example code being used on single file:
max_pad_len = 174
file_name = '201-AWCKARAK47Close0116BIT.wav'
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast', sr=None)
mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
pad_width = max_pad_len - mfccs.shape[1]
mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
mfccsscaled
I get the following error being thrown:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-118328675a5f> in <module>
4 mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
5 pad_width = max_pad_len - mfccs.shape[1]
----> 6 mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
7 mfccsscaled
<__array_function__ internals> in pad(*args, **kwargs)
c:\python\lib\site-packages\numpy\lib\arraypad.py in pad(array, pad_width, mode, **kwargs)
746
747 # Broadcast to shape (array.ndim, 2)
--> 748 pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
749
750 if callable(mode):
c:\python\lib\site-packages\numpy\lib\arraypad.py in _as_pairs(x, ndim, as_index)
517
518 if as_index and x.min() < 0:
--> 519 raise ValueError("index can't contain negative values")
520
521 # Converting the array with `tolist` seems to improve performance
ValueError: index can't contain negative values
Can you tell me why this error is being thrown and how to work around it?
BACKGROUND
I an using files obtained from https://www.boomlibrary.com/. Most of the files are 24bit depth. I tried to downsample (to 16bit) and also upsample (to 32bit) the original wav files. Even passing both of the files through librosa, the min~max data does not conform to [-1,1]. I get Librosa audio file min~max range: -1.2105241 to 1.2942984
. Not sure if this bit of data will help in converging to a resolution to my question. Thanks!
You are padding with negative values, as indicated by the exception.
The problem stems from this line:
pad_width = max_pad_len - mfccs.shape[1]
The mfccs.shape[1]
is proportional to the audio length and depends on hop length that is used for computing the mfcc
. By default the hop_length
is 512.
The audio in question is 201-AWCKARAK47Close0116BIT.wav
, a roughly 45 second long clip sampled at 96kHz. A back of the envelope calculation tells us that the number of MFCCs that you will get for this audio file is:
45 second * (96000 samples / second) / 512 samples ~ 8500
In turn:
pad_width = max_pad_len - mfccs.shape[1] = 174 - 8500 => NEGATIVE NUMBER