rpvclust

can pvclust combine not variables,but obs. in R


Let's take it as example

library("MASS")
library("pvclust")
result.par <- pvclust(Boston, nboot=1000, parallel=TRUE)
plot(result.par)

We see that pvclust combines variables. Is it possible to combine observation in clusters

ie. i want output (with cluster var)

 id    crim   zn indus chas   nox    rm   age    dis rad tax ptratio  black lstat medv cluster
1   1 0.00632 18.0  2.31    0 0.538 6.575  65.2 4.0900   1 296    15.3 396.90  4.98 24.0       1
2   2 0.02731  0.0  7.07    0 0.469 6.421  78.9 4.9671   2 242    17.8 396.90  9.14 21.6       2
3   3 0.02729  0.0  7.07    0 0.469 7.185  61.1 4.9671   2 242    17.8 392.83  4.03 34.7       1
4   4 0.03237  0.0  2.18    0 0.458 6.998  45.8 6.0622   3 222    18.7 394.63  2.94 33.4       2
5   5 0.06905  0.0  2.18    0 0.458 7.147  54.2 6.0622   3 222    18.7 396.90  5.33 36.2       3
6   6 0.02985  0.0  2.18    0 0.458 6.430  58.7 6.0622   3 222    18.7 394.12  5.21 28.7       3
7   7 0.08829 12.5  7.87    0 0.524 6.012  66.6 5.5605   5 311    15.2 395.60 12.43 22.9       1
8   8 0.14455 12.5  7.87    0 0.524 6.172  96.1 5.9505   5 311    15.2 396.90 19.15 27.1       1
9   9 0.21124 12.5  7.87    0 0.524 5.631 100.0 6.0821   5 311    15.2 386.63 29.93 16.5       2
10 10 0.17004 12.5  7.87    0 0.524 6.004  85.9 6.5921   5 311    15.2 386.71 17.10 18.9       2

how to assing clusters to the obzervations


Solution

  • Mclust function from mclust package is a valuable option.

    library("MASS")
    library("mclust")
    result.par <- Mclust(Boston)
    head(cbind(Boston, cluster=result.par$classification)) 
    

    https://cran.r-project.org/web/packages/mclust/vignettes/mclust.html

    You can also visualize your cluster by removing dendrogram by rows and clustering only features for easiness of visualization. Mclust perform mixture model clustering, so things should change a bit compared to hierarchical clustering approaches.

    library(NMF)
    aheatmap(as.matrix(Boston_2[,-15]), # remove cluster from data
             annRow = as.character(Boston_2[,15]), # consider cluster for annotating rows
             Rowv = NA)
    

    enter image description here