rggplot2dendextendggdendro

Colorize Clusters in Dendogram with ggplot2


Didzis Elferts showed how to plot a dendogram using ggplot2 and ggdendro:

horizontal dendrogram in R with labels

here is the code:

labs = paste("sta_",1:50,sep="") #new labels
rownames(USArrests)<-labs #set new row names
hc <- hclust(dist(USArrests), "ave")

library(ggplot2)
library(ggdendro)

#convert cluster object to use with ggplot
dendr <- dendro_data(hc, type="rectangle") 

#your own labels are supplied in geom_text() and label=label
ggplot() + 
  geom_segment(data=segment(dendr), aes(x=x, y=y, xend=xend, yend=yend)) + 
  geom_text(data=label(dendr), aes(x=x, y=y, label=label, hjust=0), size=3) +
  coord_flip() + scale_y_reverse(expand=c(0.2, 0)) + 
  theme(axis.line.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.text.y=element_blank(),
        axis.title.y=element_blank(),
        panel.background=element_rect(fill="white"),
        panel.grid=element_blank())

Does anyone know, how to colorize the different clusters? For example, you want to have 2 Clusters (k=2) colorized?


Solution

  • Workaround would be to plot cluster object with plot() and then use function rect.hclust() to draw borders around the clusters (nunber of clusters is set with argument k=). If result of rect.hclust() is saved as object it will make list of observation where each list element contains observations belonging to each cluster.

    plot(hc)
    gg<-rect.hclust(hc,k=2)
    

    Now this list can be converted to dataframe where column clust contains names for clusters (in this example two groups) - names are repeated according to lengths of list elemets.

    clust.gr<-data.frame(num=unlist(gg),
      clust=rep(c("Clust1","Clust2"),times=sapply(gg,length)))
    head(clust.gr)
          num  clust
    sta_1   1 Clust1
    sta_2   2 Clust1
    sta_3   3 Clust1
    sta_5   5 Clust1
    sta_8   8 Clust1
    sta_9   9 Clust1
    

    New data frame is merged with label() information of dendr object (dendro_data() result).

    text.df<-merge(label(dendr),clust.gr,by.x="label",by.y="row.names")
    head(text.df)
       label  x y num  clust
    1  sta_1  8 0   1 Clust1
    2 sta_10 28 0  10 Clust2
    3 sta_11 41 0  11 Clust2
    4 sta_12 31 0  12 Clust2
    5 sta_13 10 0  13 Clust1
    6 sta_14 37 0  14 Clust2
    

    When plotting dendrogram use text.df to add labels with geom_text() and use column clust for colors.

    ggplot() + 
      geom_segment(data=segment(dendr), aes(x=x, y=y, xend=xend, yend=yend)) + 
      geom_text(data=text.df, aes(x=x, y=y, label=label, hjust=0,color=clust), size=3) +
      coord_flip() + scale_y_reverse(expand=c(0.2, 0)) + 
      theme(axis.line.y=element_blank(),
            axis.ticks.y=element_blank(),
            axis.text.y=element_blank(),
            axis.title.y=element_blank(),
            panel.background=element_rect(fill="white"),
            panel.grid=element_blank())
    

    enter image description here