matlabprobabilitygaussianspeakermixture-model

MATLAB code for a lot of Gaussian Mixture Model


I have applied gaussmix from MATLAB's voicebox tools to calculate a Gaussian Mixture Model (GMM). However, the code gives me an error when I run it for 512 GMM components.

No_of_Clusters = 512;
No_of_Iterations = 10;
[m_ubm1,v_ubm1,w_ubm1]=gaussmix(feature,[],No_of_Iterations,No_of_Clusters);

Error using  * 
Inner matrix dimensions must agree.

Error in gaussmix (line 256)
pk=px*wt;                       % pk(k,1) effective number of data points for each mixture (could be    zero due to underflow)

I need 1024 or 2048 Mixtures for a Universal Background Model (UBM) construction. How can I calculate this mixture for a large number of components?


Solution

  • Do you want use it for Speech processing? If yes , the best way is use of MSR Identity Toolkit . this toolkit is written by Dr. Omid Sadjadi as Microsoft Researcher. He guided me how to use it.( also you need Voicebox too). Here is an example code snippet that you may use to extract MFCCs from speech files in wav files (assuming 16 kHz sample rate):

    addpath('path_to_voicebox');
    addpath('path_to_identity_toolbox');
    [s, fs] = wavread(speechFilename);
    fL = 100.0/fs; 
    fH = 8000.0/fs; 
    fRate = 0.010 * fs; 
    fSize = 0.025 * fs; 
    nChan = 27; 
    nCeps = 12; 
    premcoef = 0.97;
    s = rm_dc_n_dither(s, fs); 
    s = filter([1 -premcoef], 1, s); 
    mfc = melcepst(s, fs, '0dD', nCeps, nChan, fSize, fRate, fL, fH);
    mfc = cmvn(mfc', true);
    writehtk(featureFilename, mfc', 100000, 9);
    

    The above code extracts 39-dimensional MFCCs from pre-emphasized speech signal, and then mean and variance normalizes the features, and finally writes them to disk in HTK format. Note that this is just an example code and you may modify this code based on your needs/rescources. The two functions "rm_dc_n_dither" and "cmvn" are from the Identity Toolbox. Both Voicebox and Identity Toolbox should be in MatLab path (see the first two lines of the above code). For voice activity detection (VAD), you can use the "vadsohn" function from Voicebox that outputs frame level decisions (0 for silence and 1 for speech) at 10 ms frame skip-rate.

    After you extract the features from your database, you may follow the procedures in gmm_ubm_demo provided with the Identity Toolbox to train a UBM model.

    In case you would like to replicate our demo results on TIMIT, you may download the list files (not included in the toolbox) from here.