pythonbioinformaticsphylogenydendropy

Dendropy: Add inner node midway between two nodes


I'm very new to DendroPy. What I want to do seems simple but I can't figure how to do it correctly and I didn't find anything on the internet.

I want to add a node midway between two nodes in a existing rooted dendropy tree.

from dendropy import Tree, Taxon, Node

t1 = Tree.get_from_string(
    "(((sp1: 0.35, sp2: 0.15):0.75, sp3:1): 0.5, (sp4: 0.5, sp5: 0.05)MRCA_sp4&sp5: 1)root;",
    "newick", rooting='force-rooted'
)
t1.print_plot()
mrca = t1.mrca(taxon_labels=["sp4", "sp5"])
print(mrca.description())

Tree and MRCA node description

The MRCA of sp4 and sp5 is found correctly. Now I'm trying to add a node midway between MRCA and root, using the code below:

def add_node_midway_between_2_nodes(lowernode, taxon_label=None, node_label=None):
    newtaxon = Taxon(label=taxon_label)
    newnode = Node(taxon=newtaxon, label=node_label)
    newnode.parent_node = lowernode.parent_node
    newnode.edge_length = lowernode.edge_length/2
    lowernode.parent_node = newnode
    lowernode.edge_length = newnode.edge_length
    return newnode

node = add_node_midway_between_2_nodes(mrca, node_label="midway between root and MRCA sp4&sp5")
t1.print_plot()
str_t1 = t1.as_string(schema='newick')
print(str_t1)

Tree with a node between root and MRCA sp4&sp5

[&R] (((sp1:0.35,sp2:0.15):0.75,sp3:1.0):0.5,((sp4:0.5,sp5:0.05)MRCA_sp4&sp5:0.5)midway_between_root_and_MRCA_sp4&sp5:0.5)root;

Looking at the plot and at the string it seems to have worked. But then when I try to compute the MRCA of sp4 et sp5 again, it doesn't find "MRCA sp4&sp5" anymore but the root node.

mrca = t1.mrca(taxon_labels=["sp4", "sp5"])
print(mrca.description())

Output = Description of root node

Going through parent_node from sp5, I do still find "MRCA sp4&sp5".

Out of desperation, I tried to redo the tree using the string str_t1, but it doesn't work either and even gives me another result (just as incorrect): the node "midway between root and MRCA sp4&sp5"

t1 = Tree.get_from_string(
    str_t1,
    "newick", rooting='force-rooted'
)
mrca = t1.mrca(taxon_labels=["sp4", "sp5"])
print(mrca.description())

Output = description of node "midway between root and MRCA sp4&sp5"

So what is a clean way to add a node midway between two nodes, that doesn't create weird events afterward?

Thank you very much


Solution

  • Your code is mostly working but you should update_taxon_namespace and update_bipartitions to apply any changes of a tree topology correctly as it was recommended in documentation. So, in your case, it would look like this:

    def add_node_midway_between_2_nodes(lowernode, taxon_label=None, node_label=None):
        newtaxon = Taxon(label=taxon_label)
        newnode = Node(taxon=newtaxon, label=node_label)
        newnode.parent_node = lowernode.parent_node
        newnode.edge_length = lowernode.edge_length/2
        lowernode.parent_node = newnode
        lowernode.edge_length = newnode.edge_length
        return newnode
    
    node = add_node_midway_between_2_nodes(
        mrca, node_label="midway between root and MRCA sp4&sp5"
    )
    t1.update_taxon_namespace()
    t1.update_bipartitions(
        suppress_unifurcations=False, suppress_storage=True
    )  # suppress_storage is optional, I just do not want to create a bipartitions list
    t1.print_plot()
    str_t1 = t1.as_string(schema='newick')
    print(str_t1)
    

    NB!

    Updating taxa namespace should be prior to updating bipartitions as the latter has to use a correct TaxonNamespace. Otherwise, you still get the strange behavior.

    Howbeit, it is better to use builtin Node methods for fine tree reconstruction. For instance, I would rewrite the function such way:

    def insert_new_node_posterior(
        node: Node,
        *,
        taxon_label: Optional[str] = None,
        node_label: Optional[str] = None,
        edge_length: Real,
        # If it was product or at least reusable in the future code,
        # I would add more arguments for proportion specification,
        # using height, distance from root &c.
    ) -> Node:
        parent = node.parent_node
        if not parent:
            raise Exception("You cannot insert a node in posterior to the root.")
    
        new_taxon = Taxon(label=taxon_label)
        i = parent.child_nodes().index(node)
        parent.remove_child(node)
        intermediate_node = parent.insert_new_child(
            index=i, taxon=new_taxon, label=node_label, edge_length=edge_length
        )
        node.edge_length -= edge_length
        intermediate_node.add_child(node)
        return intermediate_node
    
    
    node = insert_new_node_posterior(
        mrca,
        node_label="midway between root and MRCA sp4&sp5",
        edge_length=mrca.edge_length / 2
    )
    t1.update_taxon_namespace()
    t1.update_bipartitions(
        suppress_unifurcations=False, suppress_storage=True
    )
    
    t1.print_plot()
    str_t1 = t1.as_string(schema='newick')
    print(str_t1)
    

    Nonetheless, Tree.mrca still shows an improper node:

    Node object at 0x1be8b959460<Node object at 0x1be8b959460: 'midway between root and MRCA sp4&sp5' (<Taxon 0x1be8cdceeb0 'None'>)>
        [Edge]
            Edge object at 0x1be8b959400 (1917897249792, Length=0.5)
        [Taxon]
             Taxon object at 0x1be8cdceeb0: <Unnamed Taxon>
        [Parent]
            Node object at 0x1be8c3b2b80<Node object at 0x1be8c3b2b80: 'root' (None)>
        [Children]
            [0] Node object at 0x1be8be4d6a0<Node object at 0x1be8be4d6a0: 'MRCA sp4&sp5' (None)>
    

    Although, this is not a bug in the case as this is just a feature of the method. Due to this in the source code:

    if cms:
        # for at least one taxon cm has 1 and bipartition has 1
        if cms == leafset_bitmask:
            # curr_node has all of the 1's that bipartition has
            if cm == leafset_bitmask:
                return curr_node  # Vovin's comment: Since there is a unifurcation,
                                  # it returns the current node
                                  # instead of the next iteration
            last_match = curr_node
            nd_source = iter(curr_node.child_nodes())
        else:
            # we have reached a child that has some, but not all of the
            #   required taxa as descendants, so we return the last_match
            return last_match
    

    For example, if we add a child to the new node, it works well:

    node = insert_new_node_posterior(
        mrca,
        node_label="midway between root and MRCA sp4&sp5",
        edge_length=mrca.edge_length / 2
    )
    node.new_child(label="sp6", edge_length=1, taxon=Taxon(label="sp6"))
    t1.update_taxon_namespace()
    t1.update_bipartitions(
        suppress_unifurcations=False, suppress_storage=True
    )
    mrca = t1.mrca(taxon_labels=["sp4", "sp5"])
    print(mrca.description())
    
    Node object at 0x1be8c1e22b0<Node object at 0x1be8c1e22b0: 'MRCA sp4&sp5' (None)>
        [Edge]
            Edge object at 0x1be8c1e2340 (1917906199360, Length=0.5)
        [Taxon]
            None
        [Parent]
            Node object at 0x1be8ba8da30<Node object at 0x1be8ba8da30: 'midway between root and MRCA sp4&sp5' (<Taxon 0x1be8ba8d370 'None'>)>
        [Children]
            [0] Node object at 0x1be8c1e2910<Node object at 0x1be8c1e2910: 'None' (<Taxon 0x1be8c1e2640 'sp4'>)>
            [1] Node object at 0x1be8c1e28e0<Node object at 0x1be8c1e28e0: 'None' (<Taxon 0x1be8c1e2e50 'sp5'>)>