I want to set the color of the branches of my dendrogram, given manually-assigned groups of my leaves. So I know in advance I want to color e.g. leaves A-C in red and all branches which only lead to red leaves shall be colored red as well.
I can color branches of my dendrogram using the "dendextend" package.
However, I have no control about which color gets assigned to which cluster ID. dendrextend
assigns the first color to the first cluster ID it finds, regardless of whether that's ID 1. However, I need ID 1 colored in color 1, etc., as I need a legend.
See this example. I want a dendrogram which colors the labels and branches A
-C
in red, D
-F
in blue and G
-I
in green.
suppressPackageStartupMessages(library(dendextend))
library(dplyr)
set.seed(12346)
# Sample data:
# ------------
# l = Leaf labels | g = assigned color of leaf | x = value for clustering
dat <- tibble(l = LETTERS[1:9],
g = factor(rep(letters[1:3], each = 3)),
x = round(runif(9,0,10)))
# color_branches() need integer cluster IDs
dat$gi <- dat$g %>% as.integer()
# Color IDs of each group
dat %>% distinct(g, gi)
## # A tibble: 3 x 2
## g gi
## <fct> <int>
## 1 a 1
## 2 b 2
## 3 c 3
# ID 1 = red, ID 2 = blue, ID 3 = green
clucols <- c("red", "blue", "green")
# Clustering & Dendrogram
# -----------------------
dst <- dist(setNames(dat$x, dat$l))
den <- as.dendrogram(hclust(dst))
o <- order.dendrogram(den)
den <- den %>%
color_branches(col = clucols, clusters = dat$gi[o])
# Transfer branch colors to labels
labels_colors(den) <- get_leaves_branches_col(den)
plot(den)
# Legend
dat %>% distinct(g, gi) %>%
{legend("topright", legend = .$g, col = clucols[.$gi], lty = 1)}
Result:
The leaves are not colored in my wanted order, but by cluster position on the plot from left to right
If you change the set.seed(...)
line to set.seed(12345)
, you see that the coloring seems correct. But this is because the clusters appear in correct order by chance, if seen from left to right:
How do I make color_branches()
assign colors by cluster ID, not by which cluster comes first?
Dendextend: Regarding how to color a dendrogram’s labels according to defined groups: This question is related, but it only targets coloring labels.
Color dendrogram branches based on external labels uptowards the root until the label matches. An answer proposed branches_attr_by_cluster
, which I translated into my example like this:
den <- den %>%
branches_attr_by_clusters(
values = clucols[dat$gi[o]],
clusters = dat$gi[o],
attr = "col")
However, alas the result was the same
A workaround is to use the function branches_attr_by_labels
to assign the color to branches for each group separately.
Replace this code in the question:
den <- den %>%
color_branches(col = clucols, clusters = dat$gi[o])
with the code below.
You need to get a list which has each element for each group. Each element in turn contains the labels you want to color and the color itself. You get it for example like this:
library(purrr)
colmap <- dat %>% group_by(g) %>% summarise(l = list(l)) %>% transpose()
colmap
## [[1]]
## [[1]]$g
## [1] 1
##
## [[1]]$l
## [1] "A" "B" "C"
##
##
## [[2]]
## [[2]]$g
## [1] 2
##
## [[2]]$l
## [1] "D" "E" "F"
##
##
## [[3]]
## [[3]]$g
## [1] 3
##
## [[3]]$l
## [1] "G" "H" "I"
Then, for each element, apply branches_attr_by_labels
. As it takes a dendrogram and
some changing parameters and also returns a dendrogram, you can use purrr::reduce
or base::Reduce
:
den <- reduce(.x = colmap, .init = den, .f = function(d, m)
branches_attr_by_labels(d, m$l, clucols[m$g] ))
Alternatively, slightly longer:
for(e in colmap){
den <- branches_attr_by_labels(den, e$l, clucols[e$g])
}
Result for set.seed(123456)
. Compare to above picture: