Motivated by discussion on this question, I decided to try some performance testing. The task I have set is somewhat simpler - given a source list A
, we wish to create a lazy iterable that repeats each element of A, N times:
def test(implementation):
A, N = list('abc'), 3
assert list(implementation(A, N)) == list('aaabbbccc')
I came up with several implementations, and tested them thus:
from itertools import chain, repeat, starmap
from timeit import timeit
flatten = chain.from_iterable
def consume(iterable):
for _ in iterable:
pass
# FAST approaches
def tools(original, count):
return flatten(map(repeat, original, repeat(count)))
def tools_star(original, count):
return flatten(starmap(repeat, zip(original, repeat(count))))
def mixed(original, count):
return flatten(repeat(a, count) for a in original)
# SLOW approaches
def mixed2(original, count):
return (x for a in original for x in repeat(a, count))
def explicit(original, count):
for a in original:
for _ in range(count):
yield a
def generator(original, count):
return (a for a in original for _ in range(count))
def mixed3(original, count):
return flatten((a for _ in range(count)) for a in original)
if __name__ == '__main__':
for impl in (tools, tools_star, mixed, mixed2, explicit, generator, mixed3):
for consumption in (consume, list):
to_time = lambda: consumption(impl(list(range(1000)), 1000))
elapsed = timeit(to_time, number=100)
print(f'{consumption.__name__}({impl.__name__}): {elapsed:.2f}')
Here are three examples of timing results on my machine:
consume(tools): 1.10
list(tools): 2.96
consume(tools_star): 1.10
list(tools_star): 2.97
consume(mixed): 1.11
list(mixed): 2.91
consume(mixed2): 4.60
list(mixed2): 6.53
consume(explicit): 5.45
list(explicit): 8.09
consume(generator): 5.98
list(generator): 7.62
consume(mixed3): 5.75
list(mixed3): 7.67
consume(tools): 1.10
list(tools): 2.88
consume(tools_star): 1.10
list(tools_star): 2.89
consume(mixed): 1.11
list(mixed): 2.87
consume(mixed2): 4.56
list(mixed2): 6.39
consume(explicit): 5.42
list(explicit): 7.24
consume(generator): 5.91
list(generator): 7.48
consume(mixed3): 5.80
list(mixed3): 7.61
consume(tools): 1.14
list(tools): 2.98
consume(tools_star): 1.10
list(tools_star): 2.90
consume(mixed): 1.11
list(mixed): 2.92
consume(mixed2): 4.76
list(mixed2): 6.49
consume(explicit): 5.69
list(explicit): 7.38
consume(generator): 5.68
list(generator): 7.52
consume(mixed3): 5.75
list(mixed3): 7.86
And from this I draw the following conclusions:
The itertools
tools offer a large performance boost, but only if we use them both to "flatten" the iterator (itertools.chain.from_iterable
rather than flattening via a nested for
expression) and to produce the sub-sequences (itertools.repeat
rather than range
). Using only repeat
offers only a minor improvement, and using only chain.from_iterable
actually seems to make things worse.
For the full itertools
implementation, it does not seem to matter how we iterate over the input sequence - whether by using a generator expression, using map
, or using itertools.starmap
. (This is unsurprising, since only O(len(A)) operations occur here rather than O(len(A) * N). The starmap
approach is unwieldy and definitely not what I'd recommend, but I included it because the code from the original motivating discussion used it.)
The amount of overhead added by creating a list from the iterable seems to vary wildly, both across methods and across timing runs (note the difference in list(explicit)
results across the two runs) - though they seem to be more consistent for the fast methods. This is especially strange since I am summing up results from multiple list creations in each test.
What all is going on under the hood of itertools
? How can we explain these timing results? It's especially strange the way that chain.from_iterable
and repeat
don't offer incremental performance benefits here, but rely on each other entirely. And what is going on with the list construction? Isn't the added overhead the same in each case (repeatedly append the same sequence of elements to an empty list)?
It mostly boils down to Big-O of the amount of time spent in the interpreter.
I
in the table below.repeat
is few opcodes shorter than storing from range
before yielding.
Y
in the table below.explicit
and generator
are virtually equivalent.G
in the table below.Here are my results:
method | complexity (interpreter only) | list | consume |
---|---|---|---|
tools | O(0) + I |
0.21 | 0.09 |
tools_star | O(0) + I |
0.21 | 0.09 |
mixed | O(N) |
0.20 | 0.09 |
mixed2 | O(N²) |
0.54 | 0.47 |
explicit | O(N²) + Y |
0.64 | 0.60 |
generator | O(N²) + Y |
0.64 | 0.60 |
tools | O(N²) + G |
0.71 | 0.65 |
Seems pretty convincing.
Also I've modified the code a little to improve the timing method and readability.
PRODUCERS = (tools, tools_star, mixed, mixed2, explicit, generator, mixed3)
CONSUMERS = (list, consume)
N = 1000
SAMPLES = 50
BATCH = 10
if __name__ == '__main__':
for consumer in CONSUMERS:
print(f"{consumer.__name__}:")
for producer in PRODUCERS:
to_time = lambda: consumer(producer(list(range(N)), N))
elapsed = timeit.repeat(to_time, repeat=SAMPLES, number=BATCH)
emin = min(elapsed) # get rid of the random fluctuations for the best theoretical time
print(f'{emin:02.2f} | {producer.__name__}')
print()