rplotggplot2ggdendro

ggplot2 and ggdendro - plotting color bars under the node leaves


Currently I'm using ggplot2 and ggdendro to plot dendrograms. However Now I'm in need to plot a discrete variable under the leaves along with the labels.

For instance, in a publication (Zhang et al., 2006) I saw a dendrogram like this (notice th color bar under the leaf labels):

Example dendrogram

I'm interested in doing the same with ggdendro + ggplot2, using data which I have already binned. Is this possible?


Solution

  • First, you need to make dataframe for the color bar. For example I used data USArrests - made clustering with hclust() function and saved the object. Then using this clustering object divided it in cluster using function cutree() and saved as column cluster. Column states contains labels of clustering object hc and the levels of this object are ordered the same as in output of hc.

    library(ggdendro)
    library(ggplot2)
    hc <- hclust(dist(USArrests), "ave")
    df2<-data.frame(cluster=cutree(hc,6),states=factor(hc$labels,levels=hc$labels[hc$order]))
    head(df2)
               cluster     states
    Alabama          1    Alabama
    Alaska           1     Alaska
    Arizona          1    Arizona
    Arkansas         2   Arkansas
    California       1 California
    Colorado         2   Colorado
    

    Now save as objects two plots - dendrogram and colorbar that is made with geom_tile() using states as x values and cluster number for colors. Formatting is done to remove all axis.

    p1<-ggdendrogram(hc, rotate=FALSE)
    
    
    p2<-ggplot(df2,aes(states,y=1,fill=factor(cluster)))+geom_tile()+
      scale_y_continuous(expand=c(0,0))+
      theme(axis.title=element_blank(),
            axis.ticks=element_blank(),
            axis.text=element_blank(),
            legend.position="none")
    

    Now you can use answer of @Baptiste to this question to align both plots.

    library(gridExtra)
    
    gp1<-ggplotGrob(p1)
    gp2<-ggplotGrob(p2)  
    
    maxWidth = grid::unit.pmax(gp1$widths[2:5], gp2$widths[2:5])
    gp1$widths[2:5] <- as.list(maxWidth)
    gp2$widths[2:5] <- as.list(maxWidth)
    
    grid.arrange(gp1, gp2, ncol=1,heights=c(4/5,1/5))
    

    enter image description here