pythonnumpyscikit-learngaussianmixture

Initialize Parameters Gaussian Mixture in Python with sklearn


I'm trying really hard to do a Gaussian Mixture with sklearn but I think I'm missing something because it definitively doesn't work.

My original datas look like this:

Genotype LogRatio  Strength
AB       0.392805  10.625016
AA       1.922468  10.765716
AB       0.22074   10.405445
BB       -0.059783 10.625016

I want to do a Gaussian Mixture with 3 components = 3 genotypes (AA|AB|BB). I know the weight of each genotype, the mean of Log Ratio for each genotype and the mean of Strength for each genotype.

wgts = [0.8,0.19,0.01]  # weight of AA,AB,BB
means = [[-0.5,9],[0.5,9],[1.5,9]] # mean(LogRatio), mean(Strenght) for AA,AB,BB 

I keep columns LogRatio and Strength and create a NumPy array.

datas = [[  0.392805  10.625016]
         [  1.922468  10.765716]
         [  0.22074   10.405445]
         [ -0.059783   9.798655]]

Then I tested the function GaussianMixture from mixture from sklearn v0.18 and tried also the function GaussianMixtureModel from sklearn v0.17 (I still don't see the difference and don't know which one to use).

gmm = mixture.GMM(n_components=3) 
OR
gmm = mixture.GaussianMixture(n_components=3)

gmm.fit(datas)

colors = ['r' if i==0 else 'b' if i==1 else 'g' for i in gmm.predict(datas)]
ax = plt.gca()
ax.scatter(datas[:,0], datas[:,1], c=colors, alpha=0.8)
plt.show()

This is what I obtain and this is a good result but it changes every time because initial parameters are calculated differently each run

enter image description here

I would like to initialize my parameters in the gaussianMixture or GMM function but I don't understand how I have to formate my datas: (


Solution

  • It is possible to control the randomness for reproducibility of the results by explicitly seeding the random_state pseudo random number generator.

    Instead of :

    gmm = mixture.GaussianMixture(n_components=3)
    

    Do :

    gmm = mixture.GaussianMixture(n_components=3, random_state=3)
    

    random_state must be an int : I've randomly set it to 3 but you can choose any other integer.

    When running multiple times with the same random_state, you will get the same results.