pythondistributiongaussianmultimodal

Plot unimodal distributions determined from a multimodal distribution


I've used GaussianMixture to analyze a multimodal distribution. From the GaussianMixture class I can access the means and covariances using the attributes means_ and covariances_. How can I use them to now plot the two underlying unimodal distributions?

I thought of using scipy.stats.norm but I don't know what to select as parameters for loc and scale. The desired output would be analogously as shown in the attached figure.

The example code of this question was modified from the answer here.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import mixture
from scipy.stats import norm

ls = np.linspace(0, 60, 1000)
multimodal_norm = norm.pdf(ls, 0, 5) + norm.pdf(ls, 20, 10)
plt.plot(ls, multimodal_norm)

# concatenate ls and multimodal to form an array of samples
# the shape is [n_samples, n_features]
# we reshape them to create an additional axis and concatenate along it
samples = np.concatenate([ls.reshape((-1, 1)), multimodal_norm.reshape((-1,1))], axis=-1)
print(samples.shape)

gmix = mixture.GaussianMixture(n_components = 2, covariance_type = "full")
fitted = gmix.fit(samples)

print(fitted.means_)
print(fitted.covariances_)

# The idea is something like the following (not working):
new_norm1 = norm.pdf(ls, fitted.means_, fitted.covariances_)
new_norm2 = norm.pdf(ls, fitted.means_, fitted.covariances_)
plt.plot(ls, new_norm1, label='Norm 1')
plt.plot(ls, new_norm2, label='Norm 2')

Solution

  • It is not entirely clear what you are trying to accomplish. You are fitting a GaussianMixture model to the concatenation of the sum of the values of pdfs of two gaussians sampled on a uniform grid, and the unifrom grid itself. This is not how a Gaussian Mixture model is meant to be fitted. Typically one fits a model to random observations drawn from some distribution (typically unknown but could be a simulated one).

    Let me assume that you want to fit the GaussianMixture model to a sample drawn from a Gaussian Mixture distribution. Presumably to test how well the fit works given you know what the expected outcome is. Here is the code for doing this, both to simulate the right distribution and to fit the model. It prints the parameters that the fit recovered from the sample -- we observe that they are indeed close to the ones we used to simulate the sample. Plot of the density of the GaussianMixture distribution that fits to the data is generated at the end

    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn import mixture
    from scipy.stats import norm
    
    # set simulation parameters
    mean1, std1, w1 = 0,5,0.5
    mean2, std2, w2 = 20,10,1-w1
    
    # simulate constituents
    n_samples = 100000
    np.random.seed(2021)
    gauss_sample_1 = np.random.normal(loc = mean1,scale = std1,size = n_samples)
    gauss_sample_2 = np.random.normal(loc = mean2,scale = std2,size = n_samples)
    binomial = np.random.binomial(n=1, p=w1, size = n_samples)
    
    # simulate gaussian mixture
    mutlimodal_samples = (gauss_sample_1 * binomial + gauss_sample_2 * (1-binomial)).reshape(-1,1)
    
    # define and fit the mixture model
    gmix = mixture.GaussianMixture(n_components = 2, covariance_type = "full")
    fitted = gmix.fit(mutlimodal_samples)
    
    print('fitted means:',fitted.means_[0][0],fitted.means_[1][0])
    print('fitted stdevs:',np.sqrt(fitted.covariances_[0][0][0]),np.sqrt(fitted.covariances_[1][0][0]))
    print('fitted weights:',fitted.weights_)
    
    # Plot component pdfs and a joint pdf
    ls = np.linspace(-50, 50, 1000)
    new_norm1 = norm.pdf(ls, fitted.means_[0][0], np.sqrt(fitted.covariances_[0][0][0]))
    new_norm2 = norm.pdf(ls, fitted.means_[1][0], np.sqrt(fitted.covariances_[1][0][0]))
    multi_pdf = w1*new_norm1 + (1-w1)*new_norm2
    plt.plot(ls, new_norm1, label='Norm pdf 1')
    plt.plot(ls, new_norm2, label='Norm pdf 2')
    plt.plot(ls, multi_pdf, label='multi-norm pdf')
    plt.legend(loc = 'best')
    plt.show()
    
    

    The results are

    fitted means: 22.358448018824642 0.8607494960575028
    fitted stdevs: 8.770962351118127 5.58538485134623
    fitted weights: [0.42517515 0.57482485]
    

    as we see they are close (up to their ordering, which of course the model cannot recover but it is irrelevant anyway) to what went into the simulation:

    mean1, std1, w1 = 0,5,0.5
    mean2, std2, w2 = 20,10,1-w1
    

    And the plot of the density and its parts. Recall that the pdf of the GaussianMixture is not the sum of the pdfs but a weighted average with weights w1, 1-w1:

    mixture-pdf