I've been trying out the kmeans clustering algorithm implementation in scipy. Are there any standard, well-defined metrics that could be used to measure the quality of the clusters generated?
ie, I have the expected labels for the data points that are clustered by kmeans. Now, once I get the clusters that have been generated, how do I evaluate the quality of these clusters with respect to the expected labels?
I am doing this very thing at that time with Spark's KMeans.
I am using:
The sum of squared distances of points to their nearest center (implemented in computeCost()).
The Unbalanced factor (see Unbalanced factor of KMeans? for an implementation and Understanding the quality of the KMeans algorithm for an explanation).
Both quantities promise a better cluster, when the are small (the less, the better).