pythoniterator

Why can't I iterate twice over the same iterator? How can I "reset" the iterator or reuse the data?


Consider the code:

def test(data):
    for row in data:
        print("first loop")
    for row in data:
        print("second loop")

When data is an iterator, for example a list iterator or a generator expression*, this does not work:

>>> test(iter([1, 2]))
first loop
first loop
>>> test((_ for _ in [1, 2]))
first loop
first loop

This prints first loop a few times, since data is non-empty. However, it does not print second loop. Why does iterating over data work the first time, but not the second time? How can I make it work a second time?

Aside from for loops, the same problem appears to occur with any kind of iteration: list/set/dict comprehensions, passing the iterator to list(), sum() or reduce(), etc.

On the other hand, if data is another kind of iterable, such as a list or a range (which are both sequences), both loops run as expected:

>>> test([1, 2])
first loop
first loop
second loop
second loop
>>> test(range(2))
first loop
first loop
second loop
second loop

* More examples:


For general theory and terminology explanation, see What are iterator, iterable, and iteration?.

To detect whether the input is an iterator or a "reusable" iterable, see Ensure that an argument can be iterated twice.


Solution

  • An iterator can only be consumed once. For example:

    data = [1, 2, 3]
    it = iter(data)
    
    next(it)
    # => 1
    next(it)
    # => 2
    next(it)
    # => 3
    next(it)
    # => StopIteration
    

    When the iterator is supplied to a for loop instead, that last StopIteration will cause it to exit the first time. Trying to use the same iterator in another for loop will cause StopIteration again immediately, because the iterator has already been consumed.

    A simple way to work around this is to save all the elements to a list, which can be traversed as many times as needed. For example:

    data = list(it)
    

    If the iterator would iterate over many elements at roughly the same time, however, it's a better idea to create independent iterators using tee():

    import itertools
    it1, it2 = itertools.tee(data, 2) # create as many as needed
    

    Now each one can be iterated over separately:

    next(it1)
    # => 1
    next(it1)
    # => 2
    next(it2)
    # => 1
    next(it2)
    # => 2
    next(it1)
    # => 3
    next(it2)
    # => 3