[SOLVED] Python sklearn- gaussian.mixture how to get the samples/points in each clusters

Python sklearn- gaussian.mixture how to get the samples/points in each clusters

I am using the GMM to cluster my dataset to K Groups, my model is running well, but there is no way to get raw data from each cluster, Can you guys suggest me some idea to solve this problem. Thank you so much.

Solution

You can do it like this (look at d0, d1, & d2).

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from pandas import DataFrame 
from sklearn import datasets 
from sklearn.mixture import GaussianMixture 

# load the iris dataset 
iris = datasets.load_iris() 

# select first two columns  
X = iris.data[:, 0:2] 

# turn it into a dataframe 
d = pd.DataFrame(X) 

# plot the data 
plt.scatter(d[0], d[1]) 

gmm = GaussianMixture(n_components = 3) 

# Fit the GMM model for the dataset  
# which expresses the dataset as a  
# mixture of 3 Gaussian Distribution 
gmm.fit(d) 

# Assign a label to each sample 
labels = gmm.predict(d) 
d['labels']= labels 
d0 = d[d['labels']== 0] 
d1 = d[d['labels']== 1] 
d2 = d[d['labels']== 2] 

# here is a possible solution for you:
d0
d1
d2

# plot three clusters in same plot 
plt.scatter(d0[0], d0[1], c ='r') 
plt.scatter(d1[0], d1[1], c ='yellow') 
plt.scatter(d2[0], d2[1], c ='g')

# print the converged log-likelihood value 
print(gmm.lower_bound_) 

# print the number of iterations needed 
# for the log-likelihood value to converge 
print(gmm.n_iter_)

# it needed 8 iterations for the log-likelihood to converge.