validationcluster-analysiscohesion

How do you manually compute for silhouette, cohesion and separation of Cluster


Good day!

I have been looking all over the Internet on how to compute for silhouette coefficient, cohesion and separation unfortunately, despite the resources, I just can't understand the formulas posted. I know that there are implementations of it in some tool, but I want to know how to manually compute them especially given a vector space model.

Assuming that I have the following clusters:

Cluster 1 ={{1,0},{1,1}}
Cluster 2 ={{1,2},{2,3},{2,2},{1,2}},
Cluster 3 ={{3,1},{3,3},{2,1}}

The way I understood it according to [1] is that I have to get the average of the points per cluster:

C1 X = 1; Y = .5
C2 X = 1.5; Y = 2.25
C3 X = 2.67; Y = 1.67

Given the mean, I have to compute for my cohesion by Sum of Square Error (SSE):

Cohesion(C1) = (1-1)^2 + (1-1)^2 + (0-.5)^2 + (0-.5)^2 = 0.5
Cohesion(C2) = (1-1.5)^2 + (2-1.5)^2 + (2-1.5)^2 + (1-1.5)^2 + (2-2.5)^2 + (3-2.5)^2 + (2-2.5)^2 +(2-2.5)^2 = 2
Cohesion(C3) = (3-2.67)^2 + (3-2.67)^2 + (2-2.67)^2 + (1-1.67)^2 + (3-1.67)^2 + (1-1.67)^2 = 3.3334

Cluster(C) = 0.5 + 2 + 3.3334 = 5.8334

My questions are:
1. Did I perform cohesion correctly?
2. How do I compute for Separation?
3. How do I compute for Silhouette Coefficient?

Thank you.


References:
[1] http://www.cs.kent.edu/~jin/DM08/ClusterValidation.pdf


Solution

  • Computation of Silhouette is straightforward, but it does not involve the centroids.

    So don't try to compute it from what you did for cohesion; compute it from your original data.