rcluster-analysishierarchical-clusteringpvclust

Adjust dendogram made by the pvclust package


I would like to improve my dendrogram that I made using the pvclust package. I am not able to see most AU / BP labels, as you can see in the image.

Could you help me solve this ?. I would like to see all AU / BP labels for the dendrogram.

Below is an executable code.

Thank you!

library(rdist)
library(pvclust)
library(geosphere)

df<-structure(list(Latitude = c(-23.8, -23.8, -23.9, -23.9, -23.9,  -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, -23.9, 
+ -23.9, -23.9, -23.9, -23.9, -23.9), Longitude = c(-49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.6, -49.7, 
+ -49.7, -49.7, -49.7, -49.7, -49.6, -49.6, -49.6, -49.6), Waste = c(526, 350, 526, 469, 285, 175, 175, 350, 350, 175, 350, 175, 175, 364, 
+ 175, 175, 350, 45.5, 54.6)), class = "data.frame", row.names = c(NA, -19L))

coordinates<-subset(df,select=c("Latitude","Longitude")) 
d<-as.dist(distm(coordinates[,2:1]))
mat <- as.matrix(d)
mat <- t(mat)
fit <- pvclust(mat, method.hclust="average", method.dist="euclidean", 
               nboot=1000, r=seq(0.9,1.4,by=.1))
fit
plot(fit,hang=-1,cex=.8,main="Average Linkage Clustering")
pvrect(fit, alpha=.80, pv="au", type="geq")

enter image description here

Considering 325 locations

enter image description here


Solution

  • The simplest way is to change the size of the plot window and increase the hang= argument:

    x11(width=12, height=8) # quartz(width=12, height=8) for mac or windows(width=12, height=8) for Windows
    plot(fit,hang=.05,cex=.8,main="Average Linkage Clustering")
    pvrect(fit, alpha=.80, pv="au", type="geq")
    

    Dendrogram

    Here is an example with 150 cases (about half the 325 you have, but from a data set that is included with R:

    data(iris)
    mat <- t(as.matrix(iris[, 1:4]))
    fit <- pvclust(mat, method.hclust="average", method.dist="euclidean",
                   nboot=1000, r=seq(0.9,1.4,by=.1))
    

    Now print the results to pdf:

    pdf(file="Dendrogram.pdf", width=13, height=7.5)
    compression="lzw")
    plot(fit,hang=.05, cex=.5, cex.pv=.5, main="Average Linkage Clustering")
    pvrect(fit, alpha=.80, pv="au", type="geq")
    dev.off()
    

    Dendrogram

    The pdf has better resolution, but the overlap in the text is less. The other option is to reduce the labelling:

    plot(fit,hang=.05, cex=.5, cex.pv=.5, print.num=FALSE, print.pv=FALSE, 
         labels=FALSE, main="Average Linkage Clustering")
    pvrect(fit, alpha=.80, pv="au", type="geq")
    

    This prints just the dendrogram without any labeling so you can see the structure but not the details. In some cases the data represent several groups. The iris data include three species. You can label just species by changing to labels=rep(1:3, each=50) so that the numbers 1, 2, 3 identify the three species.