I wish to visualize how well a clustering algorithm is doing (with certain distance metric). I have samples and their corresponding classes. To visualize, I cluster and I wish to color the branches of a dendrogram by the items in the cluster. The color will be the color most items in the hierarchical cluster correspond to (given by the data\classes).
Example: If my clustering algorithm chose indexes 1,21,24 to be a certain cluster (at a certain level) and I have a csv file containing a class number in each row corresponding to lets say 1,2,1. I want this edge to be coloured 1.
Example Code:
require(cluster)
suppressPackageStartupMessages(library(dendextend))
dir <- 'distance_metrics/'
filename <- 'aligned.csv'
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
my.dist <- as.dist(my.data)
real.clusters <-read.csv("clusters", header = T, row.names = 1)
clustered <- diana(my.dist)
# dend <- colour_branches(???dend, max(real.clusters)???)
plot(dend)
EDIT: another example partial code
dir <- 'distance_metrics/' # csv in here contains a symmetric matrix
clust.dir <- "clusters/" #csv in here contains a column vector with classes
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
filename <- 'table.csv'
my.dist <- as.dist(my.data)
real.clusters <-read.csv(paste(clust.dir, filename, sep=""), header = T, row.names = 1)
clustered <- diana(my.dist)
dnd <- as.dendrogram(clustered)
Both node and edge color attributes can be set recursively on "dendrogram" objects (which are just deeply nested lists) using dendrapply
. The cluster package also features an as.dendrogram
method for "diana" class objects, so conversion between the object types is seamless. Using your diana
clustering and borrowing some code from @Edvardoss iris example, you can create the colored dendrogram as follows:
library(cluster)
set.seed(999)
iris2 <- iris[sample(x = 1:150,size = 50,replace = F),]
clust <- diana(iris2)
dnd <- as.dendrogram(clust)
## Duplicate rownames aren't allowed, so we need to set the "labels"
## attributes recursively. We also label inner nodes here.
rectify_labels <- function(node, df){
newlab <- df$Species[unlist(node, use.names = FALSE)]
attr(node, "label") <- (newlab)
return(node)
}
dnd <- dendrapply(dnd, rectify_labels, df = iris2)
## Create a color palette as a data.frame with one row for each spp
uniqspp <- as.character(unique(iris$Species))
colormap <- data.frame(Species = uniqspp, color = rainbow(n = length(uniqspp)))
colormap[, 2] <- c("red", "blue", "green")
colormap
## Now color the inner dendrogram edges
color_dendro <- function(node, colormap){
if(is.leaf(node)){
nodecol <- colormap$color[match(attr(node, "label"), colormap$Species)]
attr(node, "nodePar") <- list(pch = NA, lab.col = nodecol)
attr(node, "edgePar") <- list(col = nodecol)
}else{
spp <- attr(node, "label")
dominantspp <- levels(spp)[which.max(tabulate(spp))]
edgecol <- colormap$color[match(dominantspp, colormap$Species)]
attr(node, "edgePar") <- list(col = edgecol)
}
return(node)
}
dnd <- dendrapply(dnd, color_dendro, colormap = colormap)
## Plot the dendrogram
plot(dnd)