rcluster-computingmahalanobis

clustering using a proximity matrix in r


I have a proximity matrix (dissimilarity) of an mahalanobis distance.

the matrix (sample):

> dput(MD[1:5,1:5])

structure(c(0, 10.277, 8.552, 8.592, 9.059, 10.277, 0, 10.917, 
9.489, 8.176, 8.552, 10.917, 0, 8.491, 8.104, 8.592, 9.489, 8.491, 
0, 9.375, 9.059, 8.176, 8.104, 9.375, 0), .Dim = c(5L, 5L), .Dimnames = list(
    c("2", "4", "5", "6", "9"), c("X2", "X4", "X5", "X6", "X9"
    )))

the matrix has 1900 people and the row name are an Id. I need to cluster those people and the to get a number of a cluster next to the person's id.

I know how to cluster using k-means but I don't how to cluster when you have already a dissimilarity matrix.


Solution

  • You can use hierarchical clustering, starting with the Mahalanobis distance matrix:

    MD
          X2     X4     X5    X6    X9
    #2  0.000 10.277  8.552 8.592 9.059
    #4 10.277  0.000 10.917 9.489 8.176
    #5  8.552 10.917  0.000 8.491 8.104
    #6  8.592  9.489  8.491 0.000 9.375
    #9  9.059  8.176  8.104 9.375 0.000
    
    hc <- hclust(as.dist(MD))
    
    clusters <- cutree(hc, k = 3) # obtain 3 clusters
    clusters
    #2 4 5 6 9 
    #1 2 3 1 3 
    
    plot(hc)
    rect.hclust(hc, k = 3, border = "red")
    

    enter image description here