I have a list
of int
s (ints
) and I want to print each one in their own line as fast as possible.
My first shot was this:
list(map(print,ints))
and running it on 107 ints together with a function to read them from standard in this takes an average of 1708 ms from 30 repetitions.
As this fairly well-received answer points out, using list()
to force-expand a map
-result is inefficient because it has to create a long list of irrelevant data, specifically in our case a list of None
values, returned by print
. That makes intuitive sense, right?
So I tried a number of other approaches:
# n print calls
list(map(print,ints)) # map
for n in ints: print(n) # loop
for i in range(0,len(ints)): print(ints[i]) # range
[print(n) for n in ints] # compr
deque(map(print,ints),0) # deque
# one print call
print('\n'.join(map(str,ints))) # cat
print(*ints, sep='\n') # sep
Measured in-situ together with ints = list(map(int,sys.stdin.readlines()))
the following are the runtimes.
map 1707 ms
loop 1926 ms
range 2027 ms
compr 1828 ms
deque 1699 ms
--------------
cat 1084 ms
sep 1462 ms
This is not a fluke. I don't want to overload this question with too much statistics, but these times are not outliers.
This is entirely counter intuitive. What I would have guessed to be the fastest (from the n-print set) is significantly slower than my first version ('map') despite that one creating a [None,...,None]
list with ten million entries. Does python optimise that list construction out?
List comprehension is in between and not hugely surprising, but the most astonishing is that in 'cat' creating a huge str
and calling print
on that is by far the fastest. I would have assumed that the memory allocations on that one str would have destroyed the runtime entirely, even if having only one io operation looks better on paper (note that flush=False
is the default).
Main question: Why is the 'map' variant so much faster than other variants with n print calls?
And why is 'cat' so much faster yet?
Example map.py:
#!/usr/bin/python3
import sys
if __name__ == '__main__':
ints = list(map(int,sys.stdin.readlines()))
list(map(print,ints))
Run as
/usr/bin/time -f '%e' -o map.run-time -a ./map.py < input.log | wc -l > /dev/null
list(map(print,ints))
does take extra time to build a large list, but all the hard work is done in C.
[print(n) for n in ints]
does the same but in Python, which is slower than C.
for n in ints: print(n)
doesn't build a list, but it's still interpreting Python, and evidently that's more detrimental than building the list. Also, n
is a global variable. That's probably why it's slower than the list comprehension. You should see some speedup if you did it inside a function so it'd be a faster local variable.
for i in range(0,len(ints)): print(ints[i])
is just silly, that's trying to be slow with all the index objects and list subscriptions.
deque(map(print,ints),0)
uses fastest possible consumption of the map
iterator and doesn't build a large list, so it's fastest. I'm actually a bit disappointed its advantage is so small.
You can see more details by looking at the bytecode. For example import dis; dis.dis('list(map(print,ints))')
shows that it just loads and calls the relevant pieces:
1 LOAD_NAME 0 (list)
PUSH_NULL
LOAD_NAME 1 (map)
PUSH_NULL
LOAD_NAME 2 (print)
LOAD_NAME 3 (ints)
CALL 2
CALL 1
Whereas the "loop" code has a Python-level loop that requires lots of interpreter steps:
1 LOAD_NAME 0 (ints)
GET_ITER
L1: FOR_ITER 11 (to L2)
STORE_NAME 1 (n)
LOAD_NAME 2 (print)
PUSH_NULL
LOAD_NAME 1 (n)
CALL 1
POP_TOP
JUMP_BACKWARD 13 (to L1)
L2: END_FOR
POP_TOP