pythonsequencetree-search

Python - Tree Search


I am searching for the most efficient tree search implementation in python. I give the tree search a sequence of length n and it should detect if the branches are already created, or if this is not the case, generate the branches.

Example:

i1: Sequence 1[0.89,0.43,0.28]

      0.89   check
       |
      0.43   check
       |
      0.28   check(last branch, last number of sequence == found)

i2: Sequence 2[0.89,0.43,0.99]

      0.89   check
       |
      0.43   check
       |                                           |
      0.28   missing(Creating new branch)         0.99

Considering the order within the sequences is important.

The goal is to keep track of a huge range of sequence (seen, unseen).

Has anyone ideas?


Solution

  • You could use an infinitely nested collections.defaultdict for this. The following function will create a defaultdict, that whenever the requested value is not present will call the same function again, creating another defaultdict, ad infinitum.

    import collections
    nested = lambda: collections.defaultdict(nested)
    dic = nested()
    

    Now, you can add the sequences to the nested defaultdict. You can do this in a loop, or recursively, or simply use reduce:

    s1 = [0.89,0.43,0.28]
    s2 = [0.89,0.43,0.99]
    
    from functools import reduce # Python 3
    reduce(lambda d, x: d[x], s1, dic)
    reduce(lambda d, x: d[x], s2, dic)
    

    Afterwards, dic looks like this: (Actually, it looks a bit different, but that's only because of defaultdict also printing the function it was created with.)

    {0.89: {0.43: {0.28: {}, 0.99: {}}}}
    

    If by "the order of the sequences is important" you mean the order in which the sequences are added, and not the order within the sequences, you will have to use a collections.OrderedDict instead. In this case, the adding of new elements is a bit more involved, but not by much.

    dic = collections.OrderedDict()
    
    def putall(d, s):
        for x in s:
            if x not in d:
                d[x] = collections.OrderedDict()
            d = d[x]
    
    putall(dic, s1)
    putall(dic, s2)