rphylogenyggtree

Convert csv data to newick format (phylogenetic tree) in R


I just start learning bioinformatic analyses, and my question may be very stupid.I have a microbiome data, with around 33,000 sequences in row, and taxonomy in column: domain, phylum, class, order, family, genus, and species. I want to convert this data into newick format so I can build phylogenetic tree from it. Some quetions I found similar to mine:
Trying to write data into newick format R Answer to this question is most close to what I want, but doing this manually is impossible considering the size of my data.
How to create Newick tree format from raw morphology data in R on Mac OSX I tried the phangorn package, but it's not working for me.
Convert csv to Newick tree I don't know Python, so I have difficulty in fully understanding this code.

I have made a sample taxa data.

Taxa data:

structure(c("A", "C", "B", "A", "C", "B", "C", "B", "B", "C", 
"C", "B", "C", "C", "A", "C", "A", "B", "B", "B", "E", "G", "F", 
"H", "F", "I", "D", "H", "H", "E", "I", "D", "D", "H", "G", "H", 
"G", "I", "H", "E", "L", "K", "Q", "N", "O", "Q", "J", "K", "O", 
"Q", "M", "K", "M", "Q", "P", "N", "J", "Q", "M", "O", "R", "Y", 
"Z", "W", "V", "Y", "S", "V", "R", "T", "X", "R", "R", "W", "R", 
"X", "Y", "Z", "Z", "Z", "b", "e", "c", "c", "b", "j", "h", "d", 
"h", "a", "g", "d", "f", "a", "f", "h", "d", "g", "i", "j", "q", 
"p", "t", "n", "u", "p", "q", "u", "t", "s", "q", "k", "k", "k", 
"l", "s", "o", "v", "m", "m", "n H", "h A", "x J", "h T", "a H", 
"t X", "h O", "m J", "g B", "f Z", "w X", "l S", "s F", "w R", 
"v U", "z X", "e B", "c O", "n R", "b J"), dim = c(20L, 7L), dimnames = list(
    c("seq_1", "seq_2", "seq_3", "seq_4", "seq_5", "seq_6", "seq_7", 
    "seq_8", "seq_9", "seq_10", "seq_11", "seq_12", "seq_13", 
    "seq_14", "seq_15", "seq_16", "seq_17", "seq_18", "seq_19", 
    "seq_20"), c("domain", "phylum", "class", "order", "family", 
    "genus", "species")))

Any package or code I can run in R is much appreciated! Thank you!


Solution

  • Using the function taxtotree from the comecol package at GitHub, your data can be converted in to a phylo object used by the ape package.

    # installing and loading libraries
    if(!require("devtools", character.only = TRUE)){
        install.packages("devtools", dependencies = TRUE)
    }
    install_github("jgmv/comecol", upgrade = TRUE)
    require(comecol)
    
    # converting the data.frame to phylo
    phy <- taxtotree(dat, a = 1, b = 7, include_rownames = TRUE)
    
    # plot and save the tree in Newick format
    plot(phy, show.node.label = TRUE)
    write.tree(phy, file = "mytree.tre")
    

    enter image description here