rhclustpvclust

Hierarchical Clustering in R - 'pvclust' Issues


I have made a reproducible example where I am having trouble with pvclust. My goal is to pick the ideal clusters in a hierarchal cluster dendogram. I've heard of 'pvclust' but can't figure out how to use it. Also if anyone has other suggestions besides this to determine the ideal clusters it will be really helpful.

My code is provided.

library(pvclust)    

employee<- c('A','B','C','D','E','F','G','H','I',
         'J','K','L','M','N','O','P',
         'Q','R','S','T',
         'U','V','W','X','Y','Z')   
salary<-c(20,30,40,50,20,40,23,05,56,23,15,43,53,65,67,23,12,14,35,11,10,56,78,23,43,56) 
testing90<-cbind(employee,salary)
testing90<-as.data.frame(testing90)
head(testing90)
testing90$salary<-as.numeric(testing90$salary)
row.names(testing90)<-testing90$employee
testing91<-data.frame(testing90[,-1])
head(testing91)
row.names(testing91)<-testing90$employee
d<-dist(as.matrix(testing91))
hc<-hclust(d,method = "ward.D2")
hc
plot(hc)

par(cex=0.6, mar=c(5, 8, 4, 1))
plot(hc, xlab="", ylab="", main="", sub="", axes=FALSE)
par(cex=1)
title(xlab="Publishers", main="Hierarchal Cluster of Publishers by eCPM")
axis(2)

fit<-pvclust(d, method.hclust="ward.D2", nboot=1000, method.dist="eucl") 

An error came up stating:

Error in names(edges.cnt) <- paste("r", 1:rl, sep = "") : 
  'names' attribute [2] must be the same length as the vector [0]

Solution

  • A solution would be to force your object d into a matrix.

    From the helpfile of pvclust:

    data numeric data matrix or data frame.

    Note that by forcing an object of type dist into a marix, as it was a diagonal it will get 'reflected' (math term escapes me right now), you can check the object that is being taken into account with the call:

    as.matrix(d)
    

    This would be the call you are looking for:

    #note that I can't 
    pvclust(as.matrix(d), method.hclust="ward.D2", nboot=1000, method.dist="eucl")
    #Bootstrap (r = 0.5)... Done.
    #Bootstrap (r = 0.58)... Done.
    #Bootstrap (r = 0.69)... Done.
    #Bootstrap (r = 0.77)... Done.
    #Bootstrap (r = 0.88)... Done.
    #Bootstrap (r = 1.0)... Done.
    #Bootstrap (r = 1.08)... Done.
    #Bootstrap (r = 1.19)... Done.
    #Bootstrap (r = 1.27)... Done.
    #Bootstrap (r = 1.38)... Done.
    #
    #Cluster method: ward.D2
    #Distance      : euclidean
    #
    #Estimates on edges:
    #
    #      au    bp se.au se.bp      v      c  pchi
    #1  1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #2  1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #3  1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #4  1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #5  1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #6  1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #7  1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #8  1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #9  1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #10 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #11 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #12 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #13 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #14 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #15 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #16 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #17 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #18 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    #19 0.853 0.885 0.022 0.003 -1.126 -0.076 0.058
    #20 0.854 0.885 0.022 0.003 -1.128 -0.073 0.069
    #21 0.861 0.897 0.022 0.003 -1.176 -0.090 0.082
    #22 0.840 0.886 0.024 0.003 -1.100 -0.106 0.060
    #23 0.794 0.690 0.023 0.005 -0.658  0.162 0.591
    #24 0.828 0.686 0.020 0.005 -0.716  0.232 0.704
    #25 1.000 1.000 0.000 0.000  0.000  0.000 0.000
    

    Note that this method will fix your call, but the validity of the clustering method, and quality of your data is for you to decide. Your MRE was trusted.