pythonconcurrencymultiprocessingthreadpoolpython-multithreading

ThreadPoolExecutor too fast for CPU bound task


I'm trying to understand how ThreadPoolExecutor and ProcessPoolExecutors work. My assumption for this test was that CPU-bound tasks such as increasing a counter wouldn't benefit from running on ThreadPoolExecutors because it doesn't release the GIL, so it can only use one process at a time.

@measure_execution_time
def cpu_slow_function(item):
    start = time.time()
    duration = random()
    counter = 1
    while time.time() - start < duration:
        counter += 1
    return item, counter


def test_thread_pool__cpu_bound():
    """
    100 tasks of average .5 seconds each, would take 50 seconds to complete sequentially.
    """

    items = list(range(100))

    with ThreadPoolExecutor(max_workers=100) as executor:
        results = list(executor.map(cpu_slow_function, items))

    for index, (result, counter) in enumerate(results):
        assert result == index
        assert counter >= 0.0

To my surprise, this test takes about ~5s to finish. Based on my assumptions, it should be taking ~50s, 100 tasks of an average of 0.5s each.

What am I missing?


Solution

  • The GIL does not prevent Python threads from running concurrently. It only prevents more than one thread from executing a byte code at any given moment in time.

    At any given moment, your program will have one worker that actually is in the middle of executing a byte code, and 99 workers that all are awaiting their chance to execute their next byte code, and in the mean time, the time.time() clock is ticking (i.e., real time is passing) for all of them.