I need to store and then operate (add new nodes, search through, etc) a tree where every node is a pair of x,y coordinates. I found ete2 module to work with trees, but I can't catch how to save a node as a tuple or list of coordinates. Is it possible with ete2?
Edit:
I followed the tutorial here http://pythonhosted.org/ete2/tutorial/tutorial_trees.html#trees To create a simple tree:
t1 = Tree("(A:1,(B:1,(E:1,D:1):0.5):0.5);" )
where A, B, C is the name of a node and a number is a distance.
or
t2 = Tree( "(A,B,(C,D));" )
I don't need names or distances, but a tree of tuples or lists, smth like:
t3 = Tree("([12.01, 10.98], [15.65, 12.10],([21.32, 6.31], [14.53, 10.86]));")
But the last input returns syntax error, in tutorials regarding ete2 I couldn't find any similar example. As a variant I think I could save coordinates as attributes, but attributes stored as strings. I need to operate with coordinates and it's tricky every time to traverse it from string to float and vice verse.
You can annotate ete trees using any type of data. Just give a name to every node, create a tree structure using such names, and annotate the tree with the coordinates.
from ete2 import Tree
name2coord = {
'a': [1, 1],
'b': [1, 1],
'c': [1, 0],
'd': [0, 1],
}
# Use format 1 to read node names of all internal nodes from the newick string
t = Tree('((a:1.1, b:1.2)c:0.9, d:0.8);', format=1)
for n in t.get_descendants():
n.add_features(coord = name2coord[n.name])
# Now you can operate with the tree and node coordinates in a very easy way:
for leaf in t.iter_leaves():
print leaf.name, leaf.coord
# a [1, 1]
# b [1, 1]
# d [0, 1]
print t.search_nodes(coord=[1,0])
# [Tree node 'c' (0x2ea635)]
You can copy, save and restore annotated trees using pickle:
t.copy('cpickle')
# or
import cPickle
cPickle.dump(t, open('mytree.pkl', 'w'))
tree = cPickle.load(open('mytree.pkl'))