rhierarchical-clusteringbootstrappingveganpvclust

Computing p-values when using pvclust with Bray-Curtis similarity


Using species abundance data recorded for multiple samples, I want to create a dendrogram where branches represent the similarity of samples. The distance measure should be Bray-Curtis-similarity. For each node, I want to obtain p-values indicating stability by bootstrapping. The function pvclust() from the R package pvclust seems to be suitable for this problem. Unfortunately, Bray-Curtis-similarity is not natively implemented as a distance measure in pvclust. However, it is possible to pass a function which returns a dist object to the method.dist argument of pvclust. vegdist from the vegan package seems suitable.

Following the approach from an answer, I found here: Error when using ''ward'' method with pvclust R package, my current approach is the following (using the dune dataset from the package vegan):

# load libraries
library(vegan)
library(pvclust)

# load example data
data(dune)

# run pvclust
pv_result <- pvclust(dune,
                     method.hclust = "ward.D",
                     method.dist = function(x){vegan::vegdist(x, "bray")},
                     n = 1000,
                     parallel = TRUE)

# plot result
plot(pv_result)

This returns the following dendrogram:

enter image description here

The calculation of p-values does not seem to work, since they are 0 for all nodes. I am looking for a way to calculate the p-values. Also approaches using alternatives to pvclust are welcome.


Solution

  • Function pvclust seems to transpose input data and calculate the clustering for columns (variables) instead of rows (observations). This means that you need to transpose input in your method.dist function as well. Moreover, if you want to have clustering for rows, you must undo the transpose by transposing your input. You should modify your call as:

    library(pvclust)
    library(vegan)
    data(dune)
    pvclust(t(dune), # to cluster rows - omit t() to cluster columns
       method.dist = function(x) vegan::vegdist(t(x), "bray"), 
       n = 1000) # and other arguments you need
    

    NB, you need t(x) in the vegdist call in all cases.