pythonmatplotlibscikit-learnhistogramgmm

How can I do a histogram with 1D gaussian mixture with sklearn?


I would like to do an histogram with mixture 1D gaussian as the picture.

enter image description here

Thanks Meng for the picture.

My histogram is this:

enter image description here

I have a file with a lot of data (4,000,000 of numbers) in a column:

1.727182
1.645300
1.619943
1.709263
1.614427
1.522313

And I'm using the follow script with modifications than Meng and Justice Lord have done :

from matplotlib import rc
from sklearn import mixture
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
import matplotlib.ticker as tkr
import scipy.stats as stats

x = open("prueba.dat").read().splitlines()

f = np.ravel(x).astype(np.float)
f=f.reshape(-1,1)
g = mixture.GaussianMixture(n_components=3,covariance_type='full')
g.fit(f)
weights = g.weights_
means = g.means_
covars = g.covariances_

plt.hist(f, bins=100, histtype='bar', density=True, ec='red', alpha=0.5)
plt.plot(f,weights[0]*stats.norm.pdf(f,means[0],np.sqrt(covars[0])), c='red')
plt.rcParams['agg.path.chunksize'] = 10000

plt.grid()
plt.show()

And when I run the script, I have the follow plot:

enter image description here

So, I don't have idea how put the start and end of all gaussians that must be there. I'm new in python and I'm confuse with the way to use the modules. Please, Can you help me and guide me how can I do this plot?

Thanks a lot


Solution

  • It's all about reshape. First, you need to reshape f. For pdf, reshape before using stats.norm.pdf. Similarly, sort and reshape before plotting.

    from matplotlib import rc
    from sklearn import mixture
    import matplotlib.pyplot as plt
    import numpy as np
    import matplotlib
    import matplotlib.ticker as tkr
    import scipy.stats as stats
    
    # x = open("prueba.dat").read().splitlines()
    
    # create the data
    x = np.concatenate((np.random.normal(5, 5, 1000),np.random.normal(10, 2, 1000)))
    
    f = np.ravel(x).astype(np.float)
    f=f.reshape(-1,1)
    g = mixture.GaussianMixture(n_components=3,covariance_type='full')
    g.fit(f)
    weights = g.weights_
    means = g.means_
    covars = g.covariances_
    
    plt.hist(f, bins=100, histtype='bar', density=True, ec='red', alpha=0.5)
    
    f_axis = f.copy().ravel()
    f_axis.sort()
    plt.plot(f_axis,weights[0]*stats.norm.pdf(f_axis,means[0],np.sqrt(covars[0])).ravel(), c='red')
    
    plt.rcParams['agg.path.chunksize'] = 10000
    
    plt.grid()
    plt.show()
    

    enter image description here