rtreenodesphylogenyape

R phylo object: how to connect node label and node number


A phylo object in R can have internal node labels (phylo_obj$node.label), but many R functions use node numbers instead of the node labels. Even the phylo object itself uses node numbers to describe the edges (phylo_obj$edge) and does not seem to have a direct mapping of internal node labels to these node numbers used for phylo_obj$edge. How do I map node labels (eg., "NodeA" or "Artiodactyla") to the node number (eg., 250 or 212)? I can't find any R functions or generally any docs on this.


Solution

  • Not exactly sure what is the objective here but if you want to select specific node numbers in the edge table and there equivalent in the node labels vector, you can simply use tree$node.label[node_number - Ntip(tree)].

    In more details:

    ## Simulating a random tree
    set.seed(1)
    my_tree <- rtree(10)
    my_tree$node.label <- paste0("node", seq(1:9))
    ## Method 1: selecting a node of interest (e.g. MRCA)
    mrca_node <- getMRCA(my_tree, tip = c("t1", "t2"))
    #[1] 16
    

    mrca_node is now the ID of the node in the edge table (in this case a number higher than 10). To select the equivalent node label you can simply select the number of tips from the mrca_node:

    ## The node label for the mrca_node
    my_tree$node.label[mrca_node-Ntip(my_tree)]
    #[1] "node6"
    

    Alternatively, you can select your node labels from the edge table

    ## Method 2: directly extracting the nodes from the edge tables
    # Function selecting the tip or node name corresponding to the edge row
    select.tip.or.node <- function(element, tree) {
        ifelse(element < Ntip(tree)+1,
               tree$tip.label[element],
               tree$node.label[element-Ntip(tree)])
    }
    
    ## Making the edge table
    edge_table <- data.frame(
                    "parent" = my_tree$edge[,1],
                    "par.name" = sapply(my_tree$edge[,1],
                                        select.tip.or.node,
                                        tree = my_tree),
                    "child" = my_tree$edge[,2],
                    "chi.name" = sapply(my_tree$edge[,2],
                                        select.tip.or.node,
                                        tree = my_tree)
                    )
    #   parent par.name child chi.name
    #1      11    node1    12    node2
    #2      12    node2     1      t10
    #3      12    node2    13    node3
    #4      13    node3     2       t6
    #5      13    node3     3       t9
    #6      11    node1    14    node4
    #7      14    node4    15    node5
    #8      15    node5    16    node6
    #9      16    node6     4       t1
    #10     16    node6    17    node7
    #11     17    node7     5       t2
    #12     17    node7     6       t7
    #13     15    node5     7       t3
    #14     14    node4    18    node8
    #15     18    node8    19    node9
    #16     19    node9     8       t8
    #17     19    node9     9       t4
    #18     18    node8    10       t5