I'm taking Andrew Ng's course on Machine Learning on Coursera. While discussing clustering, he tells us that K- means clustering algorithm is the most widely used. I've also used Kruskal's algorithm for clustering earlier which was a very efficient algorithm with path compression and rank-based unions. What makes K-means better than Kruskal's algorithm?
Kruskal’s algorithm and k-means clustering will usually generate very different clusters, since they’re optimized to find different things.
As an example, consider n points on a line that are more or less evenly spaced out, except that each point is ever so slightly further away from the point on its right than the point on its left. That is, if you zoom out, you more or less see n evenly-spaced points, but on zooming in you’ll see that the distances aren’t exactly the same and increase from left to right.
Kruskal’s algorithm finds a maximum-separation clustering, which means that it splits nodes apart so that the distances between the clusters are as large as possible. In this case, what would a maximum-separation clustering look like with k=2? Since the distances increase as we move from left to right, it’ll find a clustering of “everything except the rightmost node” and “the rightmost node.”
K-means clustering, on the other hand, finds a clustering that minimizes within-cluster variance, which means that it groups nodes so that the clustered nodes are generally close to one another. Running k-means on the above data set will split the points roughly in half down the center line, returning two clusters that are about the same size.
So which is a “better” clustering? It depends on your application. I’d suspect that more often than not we’d like this second clustering because we want nodes in a cluster to be as similar to one another as possible. That’s why we often see k-means clustering used more than Kruskal’s algorithm, though there are still cases where Kruskal is nice to have.
Note that this concern is orthogonal to efficiency. Yes, Kruskal’s algorithm is very fast, but it’s computing something different than what k-means computes.
Hope this helps!