rcluster-analysishierarchical-clusteringpvclust

Error when using ''ward'' method with pvclust R package


I am having some troubles regarding a cluster analysis that I am trying to do with the pvclust package.

Specifically, I have a data matrix composed by species (rows) and sampling stations (columns). I want to perform a CA in order to group my sampling stations according to my species abundance (which I have previously log(x+1) transformed).

Once having prepared adequately my matrix,I've tried to run a CA according to the pvclust package, using Ward's clustering method and Bray-Curtis as distance index. However, every time I get the following error message:

''Error in hclust(distance, method = method.hclust) : invalid clustering method''

I then tried to perform the same analysis using another cluster method, and I had no problem. I also tried to perform the same analysis using the hclust function from the vegan package, and I had no problem at all, too. The analysis run without any problems.

To better understand my problem, I'll display part of my matrix and the script that I used to perfrom the analysis:

          P1        P2         P3         P4         P5       P6
1  10.8750000 3.2888889  2.0769231  1.4166667  3.2395833 5.333333
3   0.3645833 0.3027778  0.3212038  0.7671958  0.4993676 0.000000
4   0.0000000 0.0000000  2.3500000  0.0000000  0.0000000 0.264000
5   0.0000000 0.7333333  0.2692308  0.0000000  0.2343750 0.000000
6   0.0000000 0.9277778  0.0000000  0.2936508  0.7291667 0.000000
7   0.4166667 6.3500000  1.0925463  0.5476190  0.1885169 0.000000
8   1.6250000 0.0000000  0.0000000  0.0000000  5.2187500 0.000000
9   0.0000000 0.8111111  0.0000000  0.0000000  0.0000000 0.000000
10  2.6770833 0.6666667  2.3304890  4.5906085  2.9652778 0.000000
15  1.8020833 0.9666667  1.4807137  3.3878968  0.1666667 0.000000
16 17.8750000 4.9555556  1.4615385  6.5000000  7.8593750 7.666667
19  4.5312500 1.0555556  3.5766941  6.7248677  2.3196181 0.000000
20  0.0000000 0.6777778  0.5384615  0.0000000  0.0000000 0.000000
21  0.0000000 0.9777778  0.0000000  0.2500000  0.0000000 0.000000
24  1.2500000 3.0583333  0.1923077  0.0000000  4.9583333 0.000000
25  0.0000000 0.0000000  2.5699634  0.0000000  0.0000000 0.000000
26  6.6666667 2.2333333 24.8730020 55.9980159 17.6239583 0.000000

Where P1-P6 are my sampling stations, and the leftmost row numbers are my different species. I'll denote this example matrix just as ''platforms''.

Afterwards, I've used the following code lines:

dist <- function(x, ...){
  vegdist(x, ...)
}

result<-pvclust(platforms,method.dist = "bray",method.hclust = "ward")

It is noteworthy that I run the three first codelines, since the bray-curtis index isn't originally available in the pvclust package. Thus, running these codelines allowed me to specify the bray-curtis index in the pvclust function

Does anyone know why it doesn't work with the pvclust package?

Any help will be much appreciated.

Kind regards,

Marie


Solution

  • There are two related issues:

    1. When calling method.hclust you need to pass hclust compatible methods. In theory pvclust checks for ward and converts to ward.D, but you probably want to pass the (correct) names of either ward.D or ward.D2.
    2. You cannot over-write dist in that fashion. However, you can pass a custom function to pvclust.

    For instance, this should work:

    library(vegan)
    library(pvclust)
    
    sample.data <- "P1  P2  P3  P4  P5  P6
    10.8750000  3.2888889   2.0769231   1.4166667   3.2395833   5.3333330
    0.3645833   0.3027778   0.3212038   0.7671958   0.4993676   0.0000000
    0.0000000   0.0000000   2.3500000   0.0000000   0.0000000   0.2640000
    0.0000000   0.7333333   0.2692308   0.0000000   0.2343750   0.0000000
    0.0000000   0.9277778   0.0000000   0.2936508   0.7291667   0.0000000
    0.4166667   6.3500000   1.0925463   0.5476190   0.1885169   0.0000000
    1.6250000   0.0000000   0.0000000   0.0000000   5.2187500   0.0000000
    0.0000000   0.8111111   0.0000000   0.0000000   0.0000000   0.0000000
    2.6770833   0.6666667   2.3304890   4.5906085   2.9652778   0.0000000
    1.8020833   0.9666667   1.4807137   3.3878968   0.1666667   0.0000000
    17.8750000  4.9555556   1.4615385   6.5000000   7.8593750   7.6666670
    4.5312500   1.0555556   3.5766941   6.7248677   2.3196181   0.0000000
    0.0000000   0.6777778   0.5384615   0.0000000   0.0000000   0.0000000
    0.0000000   0.9777778   0.0000000   0.2500000   0.0000000   0.0000000
    1.2500000   3.0583333   0.1923077   0.0000000   4.9583333   0.0000000
    0.0000000   0.0000000   2.5699634   0.0000000   0.0000000   0.0000000
    6.6666667   2.2333333   24.8730020  55.9980159  17.6239583  0.0000000"
    
    platforms <- read.table(text = sample.data, header = TRUE)
    
    result <- pvclust(platforms, 
                      method.dist = function(x){
                        vegdist(x, "bray")
                      },
                      method.hclust = "ward.D")