pythontreeetetoolkit

How can I create a tree using the python package ete2 using strings stored in a list?


I am trying to make a phylogenetic tree using the python package ete2 from synthetic data output from a cellular automaton model of mine. The data consists of pairs listed as (parent, child) where each member of the pair is a unique integer representing a mutation event. I have recast each of the members of the pair as strings and preceded them with 'r', so now:

('r1' ,'r2') would represent a parent called 'r1' giving rise to a child called 'r2'. So the output file looks like:

[['r1' 'r2']
 ['r1' 'r3']
 ['r1' 'r4']
 ['r1' 'r5']
 ['r1' 'r6']
 ['r1' 'r7']
 ['r1' 'r8']
 ['r1' 'r9']
 ['r2' 'r10']
 ['r1' 'r11']
 ['r1' 'r12']
 ['r8' 'r13']
 ['r1' 'r14']
 ['r4' 'r15']
 ['r1' 'r16']
 ['r1' 'r17']
 ['r1' 'r18']
 ['r1' 'r19']]

I want to iterate over the list to make the tree using 'add_child' but keep getting errors. My current code is:

t = Tree() # Creates an empty tree
r1 = t.add_child(name="r1")

for row in range(0, len(pairs_list)):
    a = str(pairs_list[row,1])
    b = str(pairs_list[row,0])
    a = b.add_child(name = a)

and I get the error:

Traceback (most recent call last):
  File "treetest.py", line 33, in <module>
    a = b.add_child(name = a)
AttributeError: 'str' object has no attribute 'add_child'

If I replace the 'b' in the last line of my code with r1 (or something else) it works find, but of course that doesn't represent the data... thanks in advance, universe.


Solution

  • Something like this:

    t = Tree() # Creates an empty tree
    r1 = t.add_child(name="r1")
    lookup = {"r1": r1}
    
    def sort_pairs(pair):
        # Extract integer after "r".
        return int(pair[0][1:])
    
    for pair in sorted(pairs_list, key=sort_pairs):
        parentname = pair[0]
        childname = pair[1]
        if childname not in lookup:
            if parentname in lookup:
                # Add child.
                newchild = lookup[parentname].add_child(name = childname)
                lookup.add(childname, newchild)
            else:
                raise RuntimeError('Must not happen.')