pythonmultiprocessingtqdm

How to add a progress bar to Python multiprocessing.Pool without slowing it down (TQDM is 7x slower)?


I'm running a long computation using multiprocessing.Pool in Python, and I wanted to add a progress bar to track it. Naturally, I tried tqdm with pool.imap, but surprisingly, it's ~7.3x slower than using pool.map() without it.

Here's a minimal example to reproduce the issue:

import numpy as np
import time
from multiprocessing import Pool, cpu_count
from functools import partial
from tqdm import tqdm  


def dummy_task(step_id, size=500):
    data = np.random.randn(size, 3)
    dist = np.linalg.norm(data, axis=1)
    return step_id, np.min(dist)


if __name__ == "__main__":
    steps = list(range(500000))  # Simulate 100 frames
    size = 500

    print("Running normaly...")
    t0 = time.time()
    with Pool(processes=cpu_count()) as pool:
        results = pool.map(partial(dummy_task, size=size), steps)
    t1 = time.time()
    pt = t1 - t0
    print(f"Time taken: {pt:.3f} seconds\n")

    print("Running with tqdm...")
    t2 = time.time()
    with Pool(processes=cpu_count()) as pool:
        results = list(tqdm(pool.imap(partial(dummy_task, size=size), steps), total=len(steps)))
    t3 = time.time()

    pt_tqm = t3 - t2
    print(f"Time taken: {pt_tqm:.3f} seconds")
    print (f"Pool Process with TQDM is {pt_tqm/pt:.3f} times slower.")

There us a similar question here from six years ago. But there is no usable answer there. Here I'm looking for a way to implement a progress bar or some form of progress information without sacrificing speed. Is there any way to achieve this?

NB: I just used the imap in the tqdm example as i coudl not figure out how to use map with tqdm.


Solution

  • Your linked question contains the real explanation in an answer, which relates to the difference between map and imap as Paul Smith suggests:

    imap doesn't turn the iterable you give it into a list, nor does break it into chunks (by default). It will iterate over the iterable one element at a time, and send them each to a worker process.

    Chunking is known to be crucial to performance for lightweight jobs. You can be confident that this result does not relate to tqdm.

    To add a progress bar without slowing the calculation down, set your own chunksize argument:

    with Pool(processes=cpu_count()) as pool:
            results = list(tqdm(pool.imap(partial(dummy_task, size=size), steps, chunksize=10), total=len(steps)))    
    

    (There might be an optimal chunksize, I haven't tested.)

    To quote the docs:

    The chunksize argument is the same as the one used by the map() method. For very long iterables using a large value for chunksize can make the job complete much faster than using the default value of 1.