rmachine-learningstatisticshierarchical-clusteringpvclust

Can someone explain the output from the pvclust function in R?


In the pvclust package in R, there is the pvclust() function. In the example provided in the function help file, there's the function:

boston.pp <- pvpick(boston.pv)

This is supposed to print out the clusters with high p-values. The output of this function is:

$clusters
$clusters[[1]]
[1] "rm"   "medv"

$clusters[[2]]
[1] "zn"  "dis"

$clusters[[3]]
[1] "crim"    "indus"   "nox"     "age"     "rad"     "tax"     "ptratio" "lstat"  


$edges
[1] 3 5 9

I have a lot of trouble understanding what the output means, especially since I have very limited technical background on cluster analysis. In particular, I don't understand the meaning of the vector of names under each cluster. Can someone explain this for me? Thanks!


Solution

  • https://cran.r-project.org/web/packages/pvclust/pvclust.pdf describes pvclust:

    For data expressed as (n x p) matrix or data frame, we assume that the data is n observations of p objects, which are to be clustered. The i’th row vector corresponds to the i’th observation of these objects and the j’th column vector corresponds to a sample of j’th object with size n

    Output of pvpick:

    cluster - a list of character string vectors. Each vector corresponds to the names of objects in each cluster.

    Have you plotted dendrogram of pvclust output? pvpick clusters output just lists internal points (pvclust treats each column in boston dataset as a point) in some cluster which you will see in dendrogram if you plot it. enter image description here