I am trying to plot the result of agglomerative clustering (UPGMA with Agnes) in the same 'style' as when plotting a tree using the package 'ape'. A simple example I include in the figure below
The key issue is that I want to be able to color the leaves of the dendrogram based on the a pattern in the labels of the leaves. I tried two approaches: either I used hc2Newick
or I used the code by Joris Meys as proposed in an answer to Change Dendrogram leaves . Both did not give a satisfactory output. It might be that I do not fully understand the way the dendrograms are constructed either. An ASCII save of the abundance.agnes.ave
object (stored from running agnes) can be found on https://www.dropbox.com/s/gke9qnvwptltkky/abundance.agnes.ave .
When I use the first option (with hc2Newick
from bioconductor's ctc
package) I get the following figure when using this code:
write(hc2Newick(as.hclust(abundance.agnes.ave)),file="all_samples_euclidean.tre")
eucltree<-read.tree(file="all_samples_euclidean.tre")
eucltree.laz<-ladderize(eucltree,FALSE)
tiplabs<-eucltree$tip.label
numbertiplabs<-length(tiplabs)
colourtips<-rep("green",numbertiplabs)
colourtips[grep("II",tiplabs)]<-"red"
plot(eucltree.laz,tip.color=colourtips,adj=1,cex=0.6,use.edge.length=F)
add.scale.bar()
This is obviously not ideal, the 'alignment' of the plot is not as I wanted. I supose this has to do with the branch length calculation however I do not have the foggiest idea how to solve this issue. Certainly when compared to the results of the colLab function, which look more like the dendrogram style I'd like to report. Also, using use.edge.length=T
in the code above does give a clustering that is not 'aligned' properly:
The second approach using Joris Meys' colLab function with the following code gives the next figure
clusDendro<-as.dendrogram(as.hclust(abundance.agnes.ave))
labelColors<-c("red","green")
clusMember<-rep(1,length(rownames(abundance.x)))
clusMember[grep("II",rownames(abundance.x))]<-2
names(clusMember)<-rownames(abundance.x)
colLab <- function(n)
{
if(is.leaf(n)) {
a <- attributes(n)
# clusMember - a vector designating leaf grouping
# labelColors - a vector of colors for the above grouping
labCol <- labelColors[clusMember[which(names(clusMember) == a$label)]]
attr(n, "nodePar") <- c(a$nodePar, lab.col = labCol)
}
n
}
clusDendro<-dendrapply(clusDendro, colLab)
plot(clusDendro,horiz=T,axes=F)
This plot is getting closer to what I want, however I do not know why the open circles appear at the leaves and how to remove them.
Any help is much appreciated.
Kind regards,
FM
I wrote that code quite a while ago, and it appears there's something changed a little in the mechanism.
The plot.dendrogram
function I used, has an argument nodePar
. The behaviour has changed since the last time I used that function, and although that's used normally for the inner nodes, it apparently has an effect on the outer nodes as well. The default value for pch
is 1:2
now, according to the help files.
Hence, you need to specifically specify pch=NA
in the attributes you add to the outer nodes in the colLab
function. Try adapting it like this:
colLab <- function(n)
{
if(is.leaf(n)) {
a <- attributes(n)
# clusMember - a vector designating leaf grouping
# labelColors - a vector of colors for the above grouping
labCol <- labelColors[clusMember[which(names(clusMember) == a$label)]]
attr(n, "nodePar") <-
if(is.list(a$nodePar)) c(a$nodePar, lab.col = labCol,pch=NA) else
list(lab.col = labCol,pch=NA)
}
n
}
On my machine, that solves the problem.
Alternatively, you could take a look at the argument use.edge.length
of the function plot.phylo
in the ape
package. You set it to FALSE
, but from your explanation I believe you want it to be set on the default, TRUE
.
EDIT: In order to make the function more generic, it might be a good idea to add labelColors
and clusMember
as arguments to the function. My quick-n-dirty solution isn't the best example of clean code...
Also forget what I said about using the edge length. the ape package interpretes it as a real dendrogram and putting use.edge.length
to TRUE
will convert the edge lengths to evolution time. Hence the 'weird' outlining of the dendrogram.
Also note that in case the treeleafs don't have a nodePar
attribute, adding extra parameters using the c()
function will lead to undesired effects: if you add eg lab.cex=0.6
, the c()
function will create a vector instead of a list, and convert the value for lab.cex
to character whenever there's a character value in the parameter list. In this case that's going to be the name of the color, and that explains the error you talk about in the comment.