audiomachine-learningwekafeature-extractionmfcc

How to use MFCCs in Weka for audio classification?


I am trying to develop a method to classify audio using MFCCs in Weka. The MFCCs I have are generated with a buffer size of 1024, so there is a series of MFCC coefficients for each audio recording. I want to convert these coefficients into the ARFF data format for Weka, but I'm not sure how to approach this problem.

I also asked a question about merging the data as well because I feel like this may affect the data conversion to ARFF format.

I know that for an ARFF the data needs to be listed through attributes. Should each coefficient of the MFCC be a separate attribute or an array of the coefficients as a single attribute? Should each data represent a single MFCC, a window of time, or the entire file or sound? Below, I wrote out what I think it should look like if it only took one MFCC into account, which I don't think would be able to classify an entire sound.

@relation audio

@attribute mfcc1 real
@attribute mfcc2 real
@attribute mfcc3 real
@attribute mfcc4 real
@attribute mfcc5 real
@attribute mfcc6 real
@attribute mfcc7 real
@attribute mfcc8 real
@attribute mfcc9 real
@attribute mfcc10 real
@attribute mfcc11 real
@attribute mfcc12 real
@attribute mfcc13 real
@attribute class {bark, honk, talking, wind}

@data
126.347275, -9.709645, 4.2038302, -11.606304, -2.4174862, -3.703139, 12.748064, -5.297932, -1.3114156, 2.1852574, -2.1628475, -3.622149, 5.851326, bark

Any help will be greatly appreciated.

Edit: I have generated some ARFF files using Weka using openSMILE following a method from this website, but I am not sure how this data would be used to classify the audio because each row of data is 10 milliseconds of audio from the same file. The name attribute of each row is "unknown," which I assume is the attribute that the data would try to classify. How would I be able to classify an overall sound (rather than 10 milliseconds) and compare this to several other overall sounds?


Edit #2: Success!

After more thoroughly reading the website that I found, I saw the Accumulate script and Test and Train data files. The accumulate script put all files generated each set of MFCC data from separate audio files together into one ARFF file. Their file was composed of about 200 attributes with stats for 12 MFCCs. Although I wasn't able to retrieve these stats using OpenSmile, I used Python libraries to do so. The stats were max, min, kurtosis, range, standard deviation, and so on. I accurately classified my audio files using BayesNet and Multilayer Perceptron in Weka, which both yielded 100% accuracy for me.


Solution

  • I don't know much about MFCCs, but if you are trying to classify audio files then each line under @data must represent one audio file. If you used windows of time or only one MFCC for each line under @data then the Weka classifiers would be trying to classify windows of time or MFCCs, which is not what you want. Just in case you are unfamiliar with the format (just linking because I saw you put the features of an audio file on the same line as @data), here is an example where each line represents an Iris Plant:

    % 1. Title: Iris Plants Database
    % 
    % 2. Sources:
    %      (a) Creator: R.A. Fisher
    %      (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    %      (c) Date: July, 1988
    % 
    @RELATION iris
    
    @ATTRIBUTE sepallength  NUMERIC
    @ATTRIBUTE sepalwidth   NUMERIC
    @ATTRIBUTE petallength  NUMERIC
    @ATTRIBUTE petalwidth   NUMERIC
    @ATTRIBUTE class        {Iris-setosa,Iris-versicolor,Iris-virginica}
    
    @DATA
    5.1,3.5,1.4,0.2,Iris-setosa
    4.9,3.0,1.4,0.2,Iris-setosa
    4.7,3.2,1.3,0.2,Iris-setosa
    4.6,3.1,1.5,0.2,Iris-setosa
    5.0,3.6,1.4,0.2,Iris-setosa
    5.4,3.9,1.7,0.4,Iris-setosa
    4.6,3.4,1.4,0.3,Iris-setosa
    5.0,3.4,1.5,0.2,Iris-setosa
    4.4,2.9,1.4,0.2,Iris-setosa
    4.9,3.1,1.5,0.1,Iris-setosa
    

    In terms of addressing your question on what attributes you should use for your audio file, it sounds (no pun intended) like using the MFCC coefficients could work (assuming every audio file has the same number of MFCCs because every piece data/audio file must have the same number of attributes). I would try it out and see how it goes.

    EDIT: If the audio files are not the same size you could: