audiospeech-recognitiondynamic-programmingmfccdtw

Comparing MFCC feature vectors with DTW


I'm looking for some advice on Dynamic Time Warping (DTW).

I have a Python script and extract Mel-Frequency Cepstral Coefficient (MFCC) feature vectors from .WAV files of various lengths. The feature vectors are arrays of varying lengths that contain arrays of 12 MFCCs.

For example, one .WAV file may be represented by an array that contains 10 sets of 12 feature vectors whilst another .WAV file may be represented by one array that contains 20 sets of 12 feature vectors.

I intend to use DTW to compare the two arrays of arrays, but I'm unsure how. I understand the concept of DTW and would have no issue implementing it if the feature vectors contained within the array were single numbers, my confusion is due to the fact that they are arrays.

Tl;dr: How would one compare two arrays of arrays using DTW?

Edit: I have read this question with no avail.

Many thanks, Adam


Solution

  • There is a nice tutorial on DTW here

    I have done this in a dozen papers, see zebra finch example here

    A key thing to note. You probably want to compare just ONE feature vector to the corresponding feature vector. It is rare that it is useful to use all 12.