Suppose I'm writing a function taking in an iterable, and my function wants to be agnostic as to whether that iterable is actually an iterator yet or not.
(This is a common situation, right? I think basically all the itertools functions are written this way. Take in an iterable, return an iterator.)
If I call, for instance, itertools.tee(•, 2)
on an object, and it happens to not be an iterator yet, that presumably means it would be cheaper just to call iter
on it twice to get my two independent iterators. Are itertools functions smart enough to know this, and if not, what's the best way to avoid unnecessary costs in this way?
Observe:
>>> def foo(x):
... return x.__iter__() # or return iter(x)
...
>>> l = [0, 1]
>>> it = l.__iter__()
>>> it
<list_iterator object at 0x00000190F59C3640>
>>> print(foo(l), foo(it))
<list_iterator object at 0x00000190F5980AF0> <list_iterator object at 0x00000190F59C3640>
So you do not need to worry whether the argument to your function is an iterable or already an iterator. You can call method __iter__
on something that is already an iterator and it just returns self
in that case. This is not an expensive call and would be cheaper than anything you could possibly do to test to see if it is an iterator, such as whether it has a __next__
method (and then having to call __iter__
on it anyway if it doesn't).
Update
We now see that there is a bit difference in passing to your function an iterable vs passing an iterator (depending on how the iterator is written, of course) since calling iter
twice on the former will give you two distinct iterators while calling iter
twice on the latter will not. itertools.tee
, as an example, is expecting an iterable. If you pass it an iterator that implements __iter__
that returns 'selfit will clearly work since
tee` does not need two independent iterators for it to do its magic.
But if you are writing an iterator that is passed an iterable that is implemented by internally using two or more iterators on the passed iterator, what you really want to be testing for is whether what is being passed is something that support multiple, concurrent, independent iterations regardless of whether it is an iterator or just a plain iterator:
def my_iterator(iterable):
it1 = iter(iterable)
it2 = iter(iterable)
if it1 is it2:
raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
...
class Foo:
def __init__(self, lst):
self.lst = lst
def __iter__(self):
self.idx = 0
return self
def __next__(self):
if self.idx < len(self.lst):
value = self.lst[self.idx]
self.idx += 1
return value
raise StopIteration()
f = Foo("abcd")
for x in f:
print(x)
my_iterator(f)
Prints:
a
b
c
d
Traceback (most recent call last):
File "C:\Booboo\test\test.py", line 26, in <module>
my_iterator(f)
File "C:\Booboo\test\test.py", line 5, in my_iterator
raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
ValueError: The passed iterable does not support multiple, concurrent, independent iterations.
The writer of the original, passed iterator must write it in such a way that it supports multiple, concurrent, independent iterations.