pythongaussiangmm

Plot gaussian sub-populations from GMM model


I made a GMM model using scikit learn on python as outlined here:

x = df1['DNA_2']
y = df1['DNA_1']
X = np.column_stack((x, y)) # create a 2D array from the two lists
mod2 = GaussianMixture(n_components=5, covariance_type='tied', random_state=2) # build the gmm
mod2.fit(X)

I then use this model to make predictions which I then plot:

df1['pred2'] = labels
fig, ax = plt.subplots(1,1)
ax.scatter(x, y, c=df1['pred2'].apply(lambda x: colors[x]), s = 0.5, alpha=0.2)
H,X,Y = density_estimation(x,y)
ax.contour(H, X, Y, 8, linewidths=0.5, cmap='viridis')

to get:

plot

I wanted to know how to plot the gaussian curves for the 5 populations. I know I can get the means using mod1.means_ and variances using mod1.covariances_ (both 2D) but how do I plot this to get the curves for each populations?

looking to get something like: enter image description here


Solution

  • If it's a 2D GMM like the picture, the only way is to plot a 2D density plot such as: https://pythonmachinelearning.pro/clustering-with-gaussian-mixture-models/ The line graph attached is for a 1D GMM with three components. To plot this, you need to plot the probability density component for each cluster/group.