I have a phylogenetic tree,which shows genes and how they get clustered together. It was plotted using a Euclidean distance matrix,and ape package. For more details,here is the earlier link.
Here is my data(gg.txt),which was converted to a gene matrix.
ID gene1 gene2
1 ADRA1D ADK
2 ADRA1B ADK
3 ADRA1A ADK
4 ADRB1 ASIC1
5 ADRB1 ADK
6 ADRB2 ASIC1
7 ADRB2 ADK
8 AGTR1 ACHE
9 AGTR1 ADK
10 ALOX5 ADRB1
11 ALOX5 ADRB2
12 ALPPL2 ADRB1
13 ALPPL2 ADRB2
14 AMY2A AGTR1
15 AR ADORA1
16 AR ADRA1D
17 AR ADRA1B
18 AR ADRA1A
19 AR ADRA2A
20 AR ADRA2B
The final code to generate the tree is :
library(ape)
tab=read.table("gg.txt",header=TRUE, stringsAsFactors=FALSE)
gene.names <- sort(unique(c(tab[,"gene1"],tab[,"gene2"])))
gene.matrix <- cbind(matrix(0L,nrow=length(gene.names),ncol=length(gene.names)))
colnames(gene.matrix) <- c(gene.names)
rownames(gene.matrix)<- c(gene.names)
gene.matrix[as.matrix(tab[-1])] <- 1
##calculating distances
d <- dist(gene.matrix,method="euclidean")
fit <- hclust(d, method="ward")
plot(as.phylo(fit))
We can see that there are 4 big clusters that get formed.ALOX5,AR and ALPPL2 form one cluster.ADRA1A,ADRA1B,ADRA1D,AGTR1 form another cluster.Similarly,there are 2 more clusters. Is there any way to put this information in a table,FOR EXAMPLE like below? Is there any software available to do that?
GENE CLUSTER
ALOX5 1
AR 1
ALPPL2 1
ADRA1A 2
ADRA1B 2
ADRA1D 2
AGTR1 2
..
..
..
I have only shown 20 rows,but I have 21k rows so thats the main concern.
As per @JTT cutree works great!This is what I was looking for.
cut =cutree(fit,k=5)
cut
ACHE ADK ADORA1 ADRA1A ADRA1B ADRA1D ADRA2A ADRA2B ADRB1 ADRB2 AGTR1 ALOX5 ALPPL2 AMY2A AR ASIC1
1 1 1 2 2 2 1 1 3 3 2 4 4 1 5 1