pythonmixture-modelgmmpomegranate

How to get Mean and Covariance value from pomegranate Gaussian Mixture model


In the scikit learn Gaussian mixture model we can get mean and covariance by

clf = GaussianMixture(n_components=num_clusters, covariance_type="tied", init_params='kmeans')
for i in range(clf.n_components):
    cov=clf.covariances_[i]
    mean=clf.means_[i]

But in the case of pomegranate Gaussian Mixture model says no attributes called 'covariances_' and 'means_' Thank you very much for your valuable time.


Solution

  • When you run covariance_type="tied", the model assumes a common covariance matrix for all components, so the code above does not hold. If covariance_type="tied" then it will be 1 covariance matrix under clf.covariances_ . Refer to help page:

    ‘full’ each component has its own general covariance matrix

    ‘tied’ all components share the same general covariance matrix

    With pomegranate it estimates a covariance matrix for each component, so a good comparison with running GaussianMixture from sklearn with covariance_type="full"

    from sklearn import datasets
    from sklearn.mixture import GaussianMixture
    
    iris = datasets.load_iris()
    
    clf = GaussianMixture(n_components=3, covariance_type="full", init_params='kmeans')
    clf.fit(iris.data)
    cov = []
    means = []
    for i in range(clf.n_components):
        cov.append(clf.covariances_[i])
        means.append(clf.means_[i])
    

    So for component or cluster 0 :

    means[0]
    
    array([5.006, 3.428, 1.462, 0.246])
    
    cov[0]
    
    array([[0.121765, 0.097232, 0.016028, 0.010124],
           [0.097232, 0.140817, 0.011464, 0.009112],
           [0.016028, 0.011464, 0.029557, 0.005948],
           [0.010124, 0.009112, 0.005948, 0.010885]])
    

    Now using pomegranate:

    from pomegranate import GeneralMixtureModel, MultivariateGaussianDistribution
    
    mdl = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution,
                                           n_components=3, X=iris.data)
    mdl = mdl.fit(iris.data)
    

    The parameters can be accessed under distributions, and you have a list as long as your components. For the first, you do distributions[0], second distributions[1] and so on:

    mdl.distributions[0].parameters[0]
    
    [5.005999999999999, 3.4280000000000004, 1.462, 0.24599999999999986]
    
    np.round(mdl.distributions[0].parameters[1],6)
    
    array([[0.121764, 0.097232, 0.016028, 0.010124],
           [0.097232, 0.140816, 0.011464, 0.009112],
           [0.016028, 0.011464, 0.029556, 0.005948],
           [0.010124, 0.009112, 0.005948, 0.010884]])