I am searching for the most efficient tree search implementation in python. I give the tree search a sequence of length n and it should detect if the branches are already created, or if this is not the case, generate the branches.
Example:
i1: Sequence 1[0.89,0.43,0.28]
0.89 check
|
0.43 check
|
0.28 check(last branch, last number of sequence == found)
i2: Sequence 2[0.89,0.43,0.99]
0.89 check
|
0.43 check
| |
0.28 missing(Creating new branch) 0.99
Considering the order within the sequences is important.
The goal is to keep track of a huge range of sequence (seen, unseen).
Has anyone ideas?
You could use an infinitely nested collections.defaultdict
for this. The following function will create a defaultdict
, that whenever the requested value is not present will call the same function again, creating another defaultdict
, ad infinitum.
import collections
nested = lambda: collections.defaultdict(nested)
dic = nested()
Now, you can add the sequences to the nested defaultdict. You can do this in a loop, or recursively, or simply use reduce
:
s1 = [0.89,0.43,0.28]
s2 = [0.89,0.43,0.99]
from functools import reduce # Python 3
reduce(lambda d, x: d[x], s1, dic)
reduce(lambda d, x: d[x], s2, dic)
Afterwards, dic
looks like this: (Actually, it looks a bit different, but that's only because of defaultdict
also printing the function it was created with.)
{0.89: {0.43: {0.28: {}, 0.99: {}}}}
If by "the order of the sequences is important" you mean the order in which the sequences are added, and not the order within the sequences, you will have to use a collections.OrderedDict
instead. In this case, the adding of new elements is a bit more involved, but not by much.
dic = collections.OrderedDict()
def putall(d, s):
for x in s:
if x not in d:
d[x] = collections.OrderedDict()
d = d[x]
putall(dic, s1)
putall(dic, s2)