I just start learning bioinformatic analyses, and my question may be very stupid.I have a microbiome data, with around 33,000 sequences in row, and taxonomy in column: domain, phylum, class, order, family, genus, and species. I want to convert this data into newick format so I can build phylogenetic tree from it.
Some quetions I found similar to mine:
Trying to write data into newick format R Answer to this question is most close to what I want, but doing this manually is impossible considering the size of my data.
How to create Newick tree format from raw morphology data in R on Mac OSX I tried the phangorn
package, but it's not working for me.
Convert csv to Newick tree I don't know Python, so I have difficulty in fully understanding this code.
I have made a sample taxa data.
Taxa data:
structure(c("A", "C", "B", "A", "C", "B", "C", "B", "B", "C",
"C", "B", "C", "C", "A", "C", "A", "B", "B", "B", "E", "G", "F",
"H", "F", "I", "D", "H", "H", "E", "I", "D", "D", "H", "G", "H",
"G", "I", "H", "E", "L", "K", "Q", "N", "O", "Q", "J", "K", "O",
"Q", "M", "K", "M", "Q", "P", "N", "J", "Q", "M", "O", "R", "Y",
"Z", "W", "V", "Y", "S", "V", "R", "T", "X", "R", "R", "W", "R",
"X", "Y", "Z", "Z", "Z", "b", "e", "c", "c", "b", "j", "h", "d",
"h", "a", "g", "d", "f", "a", "f", "h", "d", "g", "i", "j", "q",
"p", "t", "n", "u", "p", "q", "u", "t", "s", "q", "k", "k", "k",
"l", "s", "o", "v", "m", "m", "n H", "h A", "x J", "h T", "a H",
"t X", "h O", "m J", "g B", "f Z", "w X", "l S", "s F", "w R",
"v U", "z X", "e B", "c O", "n R", "b J"), dim = c(20L, 7L), dimnames = list(
c("seq_1", "seq_2", "seq_3", "seq_4", "seq_5", "seq_6", "seq_7",
"seq_8", "seq_9", "seq_10", "seq_11", "seq_12", "seq_13",
"seq_14", "seq_15", "seq_16", "seq_17", "seq_18", "seq_19",
"seq_20"), c("domain", "phylum", "class", "order", "family",
"genus", "species")))
Any package or code I can run in R is much appreciated! Thank you!
Using the function taxtotree
from the comecol package at GitHub, your data can be converted in to a phylo
object used by the ape
package.
# installing and loading libraries
if(!require("devtools", character.only = TRUE)){
install.packages("devtools", dependencies = TRUE)
}
install_github("jgmv/comecol", upgrade = TRUE)
require(comecol)
# converting the data.frame to phylo
phy <- taxtotree(dat, a = 1, b = 7, include_rownames = TRUE)
# plot and save the tree in Newick format
plot(phy, show.node.label = TRUE)
write.tree(phy, file = "mytree.tre")