pythonlistloopsoptimizationchunks

How to iterate over a list in chunks


I have a Python script which takes as input a list of integers, which I need to work with four integers at a time. Unfortunately, I don't have control of the input, or I'd have it passed in as a list of four-element tuples. Currently, I'm iterating over it this way:

for i in range(0, len(ints), 4):
    # dummy op for example code
    foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]

It looks a lot like "C-think", though, which makes me suspect there's a more pythonic way of dealing with this situation. The list is discarded after iterating, so it needn't be preserved. Perhaps something like this would be better?

while ints:
    foo += ints[0] * ints[1] + ints[2] * ints[3]
    ints[0:4] = []

Still doesn't quite "feel" right, though. :-/

Update: With the release of Python 3.12, I've changed the accepted answer. For anyone who has not (or cannot) make the jump to Python 3.12 yet, I encourage you to check out the previous accepted answer or any of the other excellent, backwards-compatible answers below.

Related question: How do you split a list into evenly sized chunks in Python?


Solution

  • As of Python 3.12, the itertools module gains a batched function that specifically covers iterating over batches of an input iterable, where the final batch may be incomplete (each batch is a tuple). Per the example code given in the docs:

    >>> for batch in batched('ABCDEFG', 3):
    ...     print(batch)
    ...
    ('A', 'B', 'C')
    ('D', 'E', 'F')
    ('G',)
    

    Performance notes:

    The implementation of batched, like all itertools functions to date, is at the C layer, so it's capable of optimizations Python level code cannot match, e.g.

    Between all these advantages, it should massively outperform any Python-level solution (even highly optimized ones that push most or all of the per-item work to the C layer), regardless of whether the input iterable is long or short, and regardless of whether the batch size and the size of the final (possibly incomplete) batch (zip_longest-based solutions using guaranteed unique fillvalues for safety are the best possible solution for almost all cases when itertools.batched is not available, but they can suffer in pathological cases of "few large batches, with final batch mostly, not completely, filled", especially pre-3.10 when bisect can't be used to optimize slicing off the fillvalues from O(n) linear search down to O(log n) binary search, but batched avoids that search entirely, so it won't experience pathological cases at all).