rape-phylo

Counting phylogenetic tree topologies in R


Given a multiPhylo object in R, what's the simplest way to count the number of duplicate topologies. For instance, if I randomly sample from all 15 possible resolutions of a 4 tip topology:

library(ape)
library(phytools)
m <- do.call(c, lapply(1:1000, function(x) multi2di(starTree(c('a','b','c','d')))))

I will have 1000 trees from 15 possible topologies. What's the simplest way to tabulate the count of trees with each topology (i.e. the sum of counts will be 1000).


Solution

  • Small trees

    With smallish trees (< ~20 leaves), you can use the 'TreeTools' package to convert each tree topology to a unique integer:

    library('TreeTools')
    library('phytools')
    m <- do.call(c, lapply(1:1000, function(x) multi2di(starTree(c('a','b','c','d')))))
    
    # Tabulate unique topologies
    table(vapply64(m, as.TreeNumber, 1))
    

    You can plot each numbered topology using

    topologyToPlot <- 2
    plot(as.phylo(topologyToPlot, nTip = 4))
    

    Big trees

    For larger trees, you can ensure that trees with an equivalent topology are represented identically within R by:

    Trees can then be compared using edge matrices as suggested by user12728748.