This is disassembly of a list comprehension in python-3.10:
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>>
>>> dis.dis("[True for _ in ()]")
1 0 LOAD_CONST 0 (<code object <listcomp> at 0x7fea68e0dc60, file "<dis>", line 1>)
2 LOAD_CONST 1 ('<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_CONST 2 (())
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x7fea68e0dc60, file "<dis>", line 1>:
1 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 4 (to 14)
6 STORE_FAST 1 (_)
8 LOAD_CONST 0 (True)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 2 (to 4)
>> 14 RETURN_VALUE
From what I understand it creates a code object called listcomp
which does the actual iteration and return the result list, and immediately call it.
I can't figure out the need to create a separate function to execute this job. Is this kind of an optimization trick?
The main logic of creating a function is to isolate the comprehension’s iteration variablepeps.python.org.
By creating a function,
Comprehension iteration variables remain isolated and don’t overwrite a variable of the same name in the outer scope, nor are they visible after the comprehension
However, this is inefficient at runtime. Due to this reason, python-3.12 implemented an optimization called comprehension inlining(PEP 709)peps.python.org which will no longer create a separate code objectpeps.python.org.
Dictionary, list, and set comprehensions are now inlined, rather than creating a new single-use function object for each execution of the comprehension. This speeds up execution of a comprehension by up to two times. See PEP 709 for further details.
Here is the output for the same code disassembled with python-3.12:
>>> import dis
>>> dis.dis("[True for _ in ()]")
0 0 RESUME 0
1 2 LOAD_CONST 0 (())
4 GET_ITER
6 LOAD_FAST_AND_CLEAR 0 (_)
8 SWAP 2
10 BUILD_LIST 0
12 SWAP 2
>> 14 FOR_ITER 4 (to 26)
18 STORE_FAST 0 (_)
20 LOAD_CONST 1 (True)
22 LIST_APPEND 2
24 JUMP_BACKWARD 6 (to 14)
>> 26 END_FOR
28 SWAP 2
30 STORE_FAST 0 (_)
32 RETURN_VALUE
>> 34 SWAP 2
36 POP_TOP
38 SWAP 2
40 STORE_FAST 0 (_)
42 RERAISE 0
ExceptionTable:
10 to 26 -> 34 [2]
As you can see, there is no longer a MAKE_FUNCTION
opcode nor a separate code object. Instead python-3.12 uses LOAD_FAST_AND_CLEAR
docs.python.org(at offset 6
) and STORE_FAST
(at offset 30
) opcodes to provide the isolation for the iteration variable.
Quoting from the Specification sectionpeps.python.org of the PEP 709:
Isolation of the
x
iteration variable is achieved by the combination of the newLOAD_FAST_AND_CLEAR
opcode at offset6
, which saves any outer value ofx
on the stack before running the comprehension, and30
STORE_FAST
, which restores the outer value ofx
(if any) after running the comprehension.
In addition to that, in python-3.12 there is no longer a separate frame for the comprehension in tracebacks.
Traceback in <python-3.12 | Traceback in python-3.12 |
---|---|
|
|
And here is the benchmark resultspeps.python.org(measured with MacOS M2):
$ python3.10 -m pyperf timeit -s 'l = [1]' '[x for x in l]'
Mean +- std dev: 108 ns +- 3 ns
$ python3.12 -m pyperf timeit -s 'l = [1]' '[x for x in l]'
Mean +- std dev: 60.9 ns +- 0.3 ns