I have a dataset with different levels of branches starting from the top level: stock -> mbranch -> sbranch -> lsbranch. I want to be able to visualize these levels of my data into Newick format. I have different language groups within each stock level and would like to make different trees based off of these highest level groups.
For example my data is in the format as follows:
sample= data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))
And I am trying to have an output of the newick tree format which would be something like:
tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
plot(read.dendrogram(tree))
I'm doing this so later on I can do a distance matrix of the nodes of my outputted tree.
Would the function write.tree be able to analyze data like this and make a tree from this (assuming my actual dataset is much larger)? Or in general, a function that would output the tree format. Thanks
You can use the ape::read.tree()
function to read your newick format tree
tree = "(A(C(H(I(Andrew))),D(O(J(Kevin)))),B(E(Charlie),F(K(M(Naomi))),G(L(N(Sam)))));"
my_tree <- read.tree(text = tree)
plot(my_tree)
You can then use ape::write.tree
to save the tree into a newick file:
write.tree(my_tree, file = "my_file_name.tre")
To convert your table into a "phylo"
object from ape
you can use this function (that might need some adjustements):
## The function
data.frame.to.phylo <- function(sample){
## Making an edge table
edge_table <- rbind(
## The root connecting A to B
rbind(c("root", "A"),c("root", "B")),
## All the nodes connecting to the tips
cbind(sample$stock, sample$name)
)
## Translating the values in the edge table into edge IDs
## The order must be tips, root, nodes
element_names <- c(unique(sample$name), "root", unique(sample$stock))
element_ids <- seq(1:length(element_names))
## Looping through each ID and name
for(element in element_ids) {
edge_table <- ifelse(edge_table == element_names[element], element_ids[element], edge_table)
}
## Make numeric
edge_table <- apply(edge_table, 2, as.numeric)
## Build the phylo object
phylo_object <- list()
phylo_object$edge <- edge_table
phylo_object$tip.label <- unique(sample$name)
phylo_object$node.label <- c("root", unique(sample$stock))
phylo_object$Nnode <- length(phylo_object$node.label)
## Forcing the class to be "phylo"
class(phylo_object) <- "phylo"
return(phylo_object)
}
## The data
sample = data.frame("stock" = c("A", "A", "B", "B", "B"), "mbranch" = c("C", "D", "E", "F", "G"), "sbranch" = c("H", "O", NA, "K", "L"), "lsbranch" = c("I", "J", NA, "M", "N"), "name" = c("Andrea", "Kevin", "Charlie", "Naomi", "Sam"))
## Plotting the data.frame for testing the function
plot(data.frame.to.phylo(sample))
Cheers, Thomas