pythonpython-itertools

How to divert data in an iterator into two others?


I know that I can copy an iterator using

x1, x2 = itertools.tee(x)

Then, in order to get two generators, I could filter:

filter(..., x1); filter(..., x2)

However, then I would run the same computation twice, i.e. go through x in x1 and x2.

Thus, I would do something more efficient like that:

x1, x2 = divert(into x1 if ... else x2, x)

Does anything like this exist in python 3?


Solution

  • There's no built-in tool written in python that I know of. It's a bit trick to get working, because there is no guarantee on the call order of each iterator you can produce.

    For example x could produce a x1 value followed by a x2 value, but your code could iterate over x1 until it produces a signal value, then iterate on x2 until it produces a signal value... So basically the code would have to hold all the x2 values until a x1 value is generated, which can be arbitrarily late.

    If that's really what you want to do, here is a quick idea on how to do this buffer. Warning, it's not tested at all and suppose x is an endless generator. Plus, you have to code two actual iterator class that implement __next__ that refers to this general iterator, one with category==True and the other with category==False.

    class SeparatedIterator:
        def __init__( self, iterator, filter ):
            self.it = iterator
            self.f = filter
            #The buffer contains pairs of (value,filterIsTrue)
            self.valueBuffer = []
    
        def generate():
            value = next( self.it )
            filtered = self.f( value )
            self.valueBuffer.append(( value, filtered ))
    
        def nextValue( category ):
            #search in stored values
            for i in range(len(self.valueBuffer)):
                value, filtered = self.valueBuffer[i]
                if filtered == category:
                    del self.valueBuffer[i]
                    return value
    
            #else, if none of the category found,
            #generate until one of the category is made
            self.generate()
            while self.valueBuffer[-1][1] != category:
                self.generate()
    
            #pop the value and return it
            value, _ = self.valueBuffer.pop()
            return value
    

    Else if you have more control on the iterator call order, you have to use that knowledge to implement a more customized and optimized way to switch between iterators values.