Didzis Elferts showed how to plot a dendogram using ggplot2 and ggdendro:
horizontal dendrogram in R with labels
here is the code:
labs = paste("sta_",1:50,sep="") #new labels
rownames(USArrests)<-labs #set new row names
hc <- hclust(dist(USArrests), "ave")
library(ggplot2)
library(ggdendro)
#convert cluster object to use with ggplot
dendr <- dendro_data(hc, type="rectangle")
#your own labels are supplied in geom_text() and label=label
ggplot() +
geom_segment(data=segment(dendr), aes(x=x, y=y, xend=xend, yend=yend)) +
geom_text(data=label(dendr), aes(x=x, y=y, label=label, hjust=0), size=3) +
coord_flip() + scale_y_reverse(expand=c(0.2, 0)) +
theme(axis.line.y=element_blank(),
axis.ticks.y=element_blank(),
axis.text.y=element_blank(),
axis.title.y=element_blank(),
panel.background=element_rect(fill="white"),
panel.grid=element_blank())
Does anyone know, how to colorize the different clusters? For example, you want to have 2 Clusters (k=2) colorized?
Workaround would be to plot cluster object with plot()
and then use function rect.hclust()
to draw borders around the clusters (nunber of clusters is set with argument k=
). If result of rect.hclust()
is saved as object it will make list of observation where each list element contains observations belonging to each cluster.
plot(hc)
gg<-rect.hclust(hc,k=2)
Now this list can be converted to dataframe where column clust
contains names for clusters (in this example two groups) - names are repeated according to lengths of list elemets.
clust.gr<-data.frame(num=unlist(gg),
clust=rep(c("Clust1","Clust2"),times=sapply(gg,length)))
head(clust.gr)
num clust
sta_1 1 Clust1
sta_2 2 Clust1
sta_3 3 Clust1
sta_5 5 Clust1
sta_8 8 Clust1
sta_9 9 Clust1
New data frame is merged with label()
information of dendr
object (dendro_data()
result).
text.df<-merge(label(dendr),clust.gr,by.x="label",by.y="row.names")
head(text.df)
label x y num clust
1 sta_1 8 0 1 Clust1
2 sta_10 28 0 10 Clust2
3 sta_11 41 0 11 Clust2
4 sta_12 31 0 12 Clust2
5 sta_13 10 0 13 Clust1
6 sta_14 37 0 14 Clust2
When plotting dendrogram use text.df
to add labels with geom_text()
and use column clust
for colors.
ggplot() +
geom_segment(data=segment(dendr), aes(x=x, y=y, xend=xend, yend=yend)) +
geom_text(data=text.df, aes(x=x, y=y, label=label, hjust=0,color=clust), size=3) +
coord_flip() + scale_y_reverse(expand=c(0.2, 0)) +
theme(axis.line.y=element_blank(),
axis.ticks.y=element_blank(),
axis.text.y=element_blank(),
axis.title.y=element_blank(),
panel.background=element_rect(fill="white"),
panel.grid=element_blank())