Currently I'm using ggplot2
and ggdendro
to plot dendrograms. However Now I'm in need to plot a discrete variable under the leaves along with the labels.
For instance, in a publication (Zhang et al., 2006) I saw a dendrogram like this (notice th color bar under the leaf labels):
I'm interested in doing the same with ggdendro + ggplot2, using data which I have already binned. Is this possible?
First, you need to make dataframe for the color bar. For example I used data USArrests
- made clustering with hclust()
function and saved the object. Then using this clustering object divided it in cluster using function cutree()
and saved as column cluster. Column states
contains labels of clustering object hc
and the levels of this object are ordered the same as in output of hc
.
library(ggdendro)
library(ggplot2)
hc <- hclust(dist(USArrests), "ave")
df2<-data.frame(cluster=cutree(hc,6),states=factor(hc$labels,levels=hc$labels[hc$order]))
head(df2)
cluster states
Alabama 1 Alabama
Alaska 1 Alaska
Arizona 1 Arizona
Arkansas 2 Arkansas
California 1 California
Colorado 2 Colorado
Now save as objects two plots - dendrogram and colorbar that is made with geom_tile()
using states
as x values and cluster
number for colors. Formatting is done to remove all axis.
p1<-ggdendrogram(hc, rotate=FALSE)
p2<-ggplot(df2,aes(states,y=1,fill=factor(cluster)))+geom_tile()+
scale_y_continuous(expand=c(0,0))+
theme(axis.title=element_blank(),
axis.ticks=element_blank(),
axis.text=element_blank(),
legend.position="none")
Now you can use answer of @Baptiste to this question to align both plots.
library(gridExtra)
gp1<-ggplotGrob(p1)
gp2<-ggplotGrob(p2)
maxWidth = grid::unit.pmax(gp1$widths[2:5], gp2$widths[2:5])
gp1$widths[2:5] <- as.list(maxWidth)
gp2$widths[2:5] <- as.list(maxWidth)
grid.arrange(gp1, gp2, ncol=1,heights=c(4/5,1/5))