I have a Python script which takes as input a list of integers, which I need to work with four integers at a time. Unfortunately, I don't have control of the input, or I'd have it passed in as a list of four-element tuples. Currently, I'm iterating over it this way:
for i in range(0, len(ints), 4):
# dummy op for example code
foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]
It looks a lot like "C-think", though, which makes me suspect there's a more pythonic way of dealing with this situation. The list is discarded after iterating, so it needn't be preserved. Perhaps something like this would be better?
while ints:
foo += ints[0] * ints[1] + ints[2] * ints[3]
ints[0:4] = []
Still doesn't quite "feel" right, though. :-/
Update: With the release of Python 3.12, I've changed the accepted answer. For anyone who has not (or cannot) make the jump to Python 3.12 yet, I encourage you to check out the previous accepted answer or any of the other excellent, backwards-compatible answers below.
Related question: How do you split a list into evenly sized chunks in Python?
As of Python 3.12, the itertools
module gains a batched
function that specifically covers iterating over batches of an input iterable, where the final batch may be incomplete (each batch is a tuple
). Per the example code given in the docs:
>>> for batch in batched('ABCDEFG', 3):
... print(batch)
...
('A', 'B', 'C')
('D', 'E', 'F')
('G',)
Performance notes:
The implementation of batched
, like all itertools
functions to date, is at the C layer, so it's capable of optimizations Python level code cannot match, e.g.
tuple
of precisely the correct size (for all but the last batch), instead of building the tuple
up element by element with amortized growth causing multiple reallocations (the way a solution calling tuple
on an islice
does).__next__
function of the underlying iterator once per batch, not n
times per batch (the way a zip_longest((iter(iterable),) * n)
-based approach does)NULL
check (trivial, and required to handle possible exceptions anyway)goto
followed by a direct realloc
(no making a copy into a smaller tuple
) down to the already known final size, since it's tracking how many elements it has successfully pulled (no complex "create sentinel for use as fillvalue
and do Python level if
/else
checks for each batch to see if it's empty, with the final batch requiring a search for where the fillvalue
appeared last, to create the cut-down tuple
" required by zip_longest
-based solutions).Between all these advantages, it should massively outperform any Python-level solution (even highly optimized ones that push most or all of the per-item work to the C layer), regardless of whether the input iterable is long or short, and regardless of whether the batch size and the size of the final (possibly incomplete) batch (zip_longest
-based solutions using guaranteed unique fillvalue
s for safety are the best possible solution for almost all cases when itertools.batched
is not available, but they can suffer in pathological cases of "few large batches, with final batch mostly, not completely, filled", especially pre-3.10 when bisect
can't be used to optimize slicing off the fillvalue
s from O(n)
linear search down to O(log n)
binary search, but batched
avoids that search entirely, so it won't experience pathological cases at all).