pythonlist-comprehensioncpythonpython-internalsdis

Why list comprehensions create a function internally?


This is disassembly of a list comprehension in :

Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> 
>>> dis.dis("[True for _ in ()]")
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7fea68e0dc60, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               2 (())
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7fea68e0dc60, file "<dis>", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 4 (to 14)
              6 STORE_FAST               1 (_)
              8 LOAD_CONST               0 (True)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            2 (to 4)
        >>   14 RETURN_VALUE

From what I understand it creates a code object called listcomp which does the actual iteration and return the result list, and immediately call it. I can't figure out the need to create a separate function to execute this job. Is this kind of an optimization trick?


Solution

  • The main logic of creating a function is to isolate the comprehension’s iteration variablepeps.python.org.

    By creating a function:

    Comprehension iteration variables remain isolated and don’t overwrite a variable of the same name in the outer scope, nor are they visible after the comprehension

    However, this is inefficient at runtime. Due to this reason, implemented an optimization called comprehension inlining(PEP 709)peps.python.org which will no longer create a separate code objectpeps.python.org.

    Dictionary, list, and set comprehensions are now inlined, rather than creating a new single-use function object for each execution of the comprehension. This speeds up execution of a comprehension by up to two times. See PEP 709 for further details.

    Here is the output for the same code disassembled with :

    >>> import dis
    >>> dis.dis("[True for _ in ()]")
      0           0 RESUME                   0
    
      1           2 LOAD_CONST               0 (())
                  4 GET_ITER
                  6 LOAD_FAST_AND_CLEAR      0 (_)
                  8 SWAP                     2
                 10 BUILD_LIST               0
                 12 SWAP                     2
            >>   14 FOR_ITER                 4 (to 26)
                 18 STORE_FAST               0 (_)
                 20 LOAD_CONST               1 (True)
                 22 LIST_APPEND              2
                 24 JUMP_BACKWARD            6 (to 14)
            >>   26 END_FOR
                 28 SWAP                     2
                 30 STORE_FAST               0 (_)
                 32 RETURN_VALUE
            >>   34 SWAP                     2
                 36 POP_TOP
                 38 SWAP                     2
                 40 STORE_FAST               0 (_)
                 42 RERAISE                  0
    ExceptionTable:
      10 to 26 -> 34 [2]

    As you can see, there is no longer a MAKE_FUNCTION opcode nor a separate code object. Instead uses LOAD_FAST_AND_CLEARdocs.python.org(at offset 6) and STORE_FAST(at offset 30) opcodes to provide the isolation for the iteration variable.

    Quoting from the Specification sectionpeps.python.org of the PEP 709:

    Isolation of the x iteration variable is achieved by the combination of the new LOAD_FAST_AND_CLEAR opcode at offset 6, which saves any outer value of x on the stack before running the comprehension, and 30 STORE_FAST, which restores the outer value of x (if any) after running the comprehension.

    In addition to that, in there is no longer a separate frame for the comprehension in tracebacks.

    Traceback in < Traceback in
    >>> [1 / 0 for i in range(10)]
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 1, in <listcomp>
    ZeroDivisionError: division by zero
    >>> [1 / 0 for i in range(10)]
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ZeroDivisionError: division by zero

    And here is the benchmark resultspeps.python.org(measured with MacOS M2):

    $ python3.10 -m pyperf timeit -s 'l = [1]' '[x for x in l]'
    Mean +- std dev: 108 ns +- 3 ns
    $ python3.12 -m pyperf timeit -s 'l = [1]' '[x for x in l]'
    Mean +- std dev: 60.9 ns +- 0.3 ns