I made a GMM model using scikit learn on python as outlined here:
x = df1['DNA_2']
y = df1['DNA_1']
X = np.column_stack((x, y)) # create a 2D array from the two lists
mod2 = GaussianMixture(n_components=5, covariance_type='tied', random_state=2) # build the gmm
mod2.fit(X)
I then use this model to make predictions which I then plot:
df1['pred2'] = labels
fig, ax = plt.subplots(1,1)
ax.scatter(x, y, c=df1['pred2'].apply(lambda x: colors[x]), s = 0.5, alpha=0.2)
H,X,Y = density_estimation(x,y)
ax.contour(H, X, Y, 8, linewidths=0.5, cmap='viridis')
to get:
I wanted to know how to plot the gaussian curves for the 5 populations. I know I can get the means using mod1.means_
and variances using mod1.covariances_
(both 2D) but how do I plot this to get the curves for each populations?
If it's a 2D GMM like the picture, the only way is to plot a 2D density plot such as: https://pythonmachinelearning.pro/clustering-with-gaussian-mixture-models/ The line graph attached is for a 1D GMM with three components. To plot this, you need to plot the probability density component for each cluster/group.