pythongilpython-3.13

Python 3.13 with free-thread is slow


I was trying this new free-thread version of the interpreter, but find out that it actually takes longer than the GIL enabled version. I did observe that the usage on the CPU increase a lot for the free-thread interpreter, is there something I misunderstand about this new interpreter?

Version downloaded: python-3.13.0rc2-amd64

Code:

from concurrent.futures import ThreadPoolExecutor
from random import randint

import time


def create_table(size):
    a, b = size
    table = []
    for i in range(0, a):
        row = []
        for j in range(0, b):
            row.append(randint(0, 100))
        table.append(row)
    return table


if __name__ == "__main__":
    start = time.perf_counter()
    with ThreadPoolExecutor(4) as pool:
        result = pool.map(create_table, [(1000, 10000) for _ in range(10)])
    end = time.perf_counter()
    print(end - start, *[len(each) for each in result])

python3.13t takes 56sec
python3.13 takes 26sec
python3.12 takes 25sec

my benchmark


Solution

  • The primary culprit appears to be the randint module, as it is a static import and appears to share a mutex between threads. Another problem is that you're only able to process 4 tables at a time. Since you want to create 10 tables in total, you'll be running batches of 4-4-2.

    Here is the code with the randint problem addressed by replacing it with a SystemRandom instance per thread:

    from concurrent.futures import ThreadPoolExecutor
    from random import SystemRandom
    
    import time
    
    
    def create_table(size):
        a, b = size
        table = []
        random = SystemRandom()
        for i in range(0, a):
            row = []
            for j in range(0, b):
                row.append(random.randint(0, 100))
            table.append(row)
        return table
    
    
    if __name__ == "__main__":
        start = time.perf_counter()
        with ThreadPoolExecutor(4) as pool:
            result = pool.map(create_table, [(1000, 10000) for _ in range(10)])
        end = time.perf_counter()
        print(end - start, *[len(each) for each in result])
    

    And here is some code that achieves the same thing, but is more flexible with the thread creation and avoids unnecessary inter-thread communication:

    import threading
    from random import SystemRandom
    
    import time
    
    
    def create_table(obj, result: list[list[int]]):
        a, b = obj
        print(f"Starting thread {threading.current_thread().name}")
        random = SystemRandom()
        result[:] = [[random.randint(0, 100) for j in range(b)] for i in range(a)]
        print(f"Finished thread {threading.current_thread().name}")
    
    
    if __name__ == "__main__":
        start = time.perf_counter()
        obj = (1000, 10000)
        results: list[list[list[int]]] = []
        threads: list[threading.Thread] = []
        for _ in range(4):
            result: list[list[int]] = []
            thread = threading.Thread(target=create_table, args=(obj, result))
            thread.start()
            threads.append(thread)
            results.append(result)
        for thread in threads:
            thread.join()
        print([len(r) for r in results])
        end = time.perf_counter()
        print(end - start)