Using species abundance data recorded for multiple samples, I want to create a dendrogram where branches represent the similarity of samples. The distance measure should be Bray-Curtis-similarity. For each node, I want to obtain p-values indicating stability by bootstrapping. The function pvclust()
from the R package pvclust
seems to be suitable for this problem. Unfortunately, Bray-Curtis-similarity is not natively implemented as a distance measure in pvclust
. However, it is possible to pass a function which returns a dist object to the method.dist
argument of pvclust
. vegdist
from the vegan
package seems suitable.
Following the approach from an answer, I found here: Error when using ''ward'' method with pvclust R package, my current approach is the following (using the dune dataset from the package vegan
):
# load libraries
library(vegan)
library(pvclust)
# load example data
data(dune)
# run pvclust
pv_result <- pvclust(dune,
method.hclust = "ward.D",
method.dist = function(x){vegan::vegdist(x, "bray")},
n = 1000,
parallel = TRUE)
# plot result
plot(pv_result)
This returns the following dendrogram:
The calculation of p-values does not seem to work, since they are 0 for all nodes. I am looking for a way to calculate the p-values. Also approaches using alternatives to pvclust
are welcome.
Function pvclust
seems to transpose input data and calculate the clustering for columns (variables) instead of rows (observations). This means that you need to transpose input in your method.dist
function as well. Moreover, if you want to have clustering for rows, you must undo the transpose by transposing your input. You should modify your call as:
library(pvclust)
library(vegan)
data(dune)
pvclust(t(dune), # to cluster rows - omit t() to cluster columns
method.dist = function(x) vegan::vegdist(t(x), "bray"),
n = 1000) # and other arguments you need
NB, you need t(x)
in the vegdist
call in all cases.