k-meansexpectation-maximization

K-means as specialized case of generalized EM Algorithm


I am using a dataset to make 2 clusters using EM and then K-means. I already have implemented K-means and EM Algorithm separately. Now I am trying to derive k-means from my implementation of EM Algorithm to do clustering. I have 2 questions in mind.

  1. K-means is viewed as a special case of the generalized EM algorithm. But what assumptions do we need to make to derive k-means from EM algorithm?

  2. Also, in coding perspective, what changes do we need to make in implementation of EM algorithm so that it starts behaving exactly like k-means algorithm? I assume that we need to share same co-variance matrix between both clusters. Is that right to assume?

This is what I am getting using k-means.

Clustering K-means

This is clustering using EM.

Clustering EM


Solution

  • K-means and EM clustering are very related, but not exactly the same. Two changes to EM would make it very, very similar to K-means:

    1. EM uses multi-dimensional distributions. Constrain the standard deviations of the distributions to be the same in all dimensions.
    2. Revise the output of EM to produce only the most probably cluster. EM produces soft-clusters (a separate probability that a point is in each cluster), whereas K-means produces hard clusters (a single cluster choice).

    I don't know how these "fixes" translate into your particular code.

    I'm not 100% sure that under all circumstances, such an EM approach would converge to exactly the same clusters as K-means. I am confident that the two methods would produce very comparable results under most circumstances.