[SOLVED] Detecting cheapest way to build independent iterators

Detecting cheapest way to build independent iterators

Suppose I'm writing a function taking in an iterable, and my function wants to be agnostic as to whether that iterable is actually an iterator yet or not.

(This is a common situation, right? I think basically all the itertools functions are written this way. Take in an iterable, return an iterator.)

If I call, for instance, itertools.tee(•, 2) on an object, and it happens to not be an iterator yet, that presumably means it would be cheaper just to call iter on it twice to get my two independent iterators. Are itertools functions smart enough to know this, and if not, what's the best way to avoid unnecessary costs in this way?

Solution

Observe:

>>> def foo(x):
...     return x.__iter__() # or return iter(x)
...
>>> l = [0, 1]
>>> it = l.__iter__()
>>> it
<list_iterator object at 0x00000190F59C3640>
>>> print(foo(l), foo(it))
<list_iterator object at 0x00000190F5980AF0> <list_iterator object at 0x00000190F59C3640>

So you do not need to worry whether the argument to your function is an iterable or already an iterator. You can call method __iter__ on something that is already an iterator and it just returns self in that case. This is not an expensive call and would be cheaper than anything you could possibly do to test to see if it is an iterator, such as whether it has a __next__ method (and then having to call __iter__ on it anyway if it doesn't).

Update

We now see that there is a bit difference in passing to your function an iterable vs passing an iterator (depending on how the iterator is written, of course) since calling iter twice on the former will give you two distinct iterators while calling iter twice on the latter will not. itertools.tee, as an example, is expecting an iterable. If you pass it an iterator that implements __iter__ that returns 'selfit will clearly work sincetee` does not need two independent iterators for it to do its magic.

But if you are writing an iterator that is passed an iterable that is implemented by internally using two or more iterators on the passed iterator, what you really want to be testing for is whether what is being passed is something that support multiple, concurrent, independent iterations regardless of whether it is an iterator or just a plain iterator:

def my_iterator(iterable):
     it1 = iter(iterable)
     it2 = iter(iterable)
     if it1 is it2:
          raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
     ...

class Foo:
     def __init__(self, lst):
          self.lst = lst

     def __iter__(self):
          self.idx = 0
          return self

     def __next__(self):
          if self.idx < len(self.lst):
               value = self.lst[self.idx]
               self.idx += 1
               return value
          raise StopIteration()

f = Foo("abcd")
for x in f:
     print(x)

my_iterator(f)

Prints:

a
b
c
d
Traceback (most recent call last):
  File "C:\Booboo\test\test.py", line 26, in <module>
    my_iterator(f)
  File "C:\Booboo\test\test.py", line 5, in my_iterator
    raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
ValueError: The passed iterable does not support multiple, concurrent, independent iterations.

The writer of the original, passed iterator must write it in such a way that it supports multiple, concurrent, independent iterations.