I'm running a long computation using multiprocessing.Pool in Python, and I wanted to add a progress bar to track it. Naturally, I tried tqdm with pool.imap, but surprisingly, it's ~7.3x slower than using pool.map() without it.
Here's a minimal example to reproduce the issue:
import numpy as np
import time
from multiprocessing import Pool, cpu_count
from functools import partial
from tqdm import tqdm
def dummy_task(step_id, size=500):
data = np.random.randn(size, 3)
dist = np.linalg.norm(data, axis=1)
return step_id, np.min(dist)
if __name__ == "__main__":
steps = list(range(500000)) # Simulate 100 frames
size = 500
print("Running normaly...")
t0 = time.time()
with Pool(processes=cpu_count()) as pool:
results = pool.map(partial(dummy_task, size=size), steps)
t1 = time.time()
pt = t1 - t0
print(f"Time taken: {pt:.3f} seconds\n")
print("Running with tqdm...")
t2 = time.time()
with Pool(processes=cpu_count()) as pool:
results = list(tqdm(pool.imap(partial(dummy_task, size=size), steps), total=len(steps)))
t3 = time.time()
pt_tqm = t3 - t2
print(f"Time taken: {pt_tqm:.3f} seconds")
print (f"Pool Process with TQDM is {pt_tqm/pt:.3f} times slower.")
There us a similar question here from six years ago. But there is no usable answer there. Here I'm looking for a way to implement a progress bar or some form of progress information without sacrificing speed. Is there any way to achieve this?
NB: I just used the imap in the tqdm example as i coudl not figure out how to use map with tqdm.
Your linked question contains the real explanation in an answer, which relates to the difference between map and imap as Paul Smith suggests:
imap
doesn't turn the iterable you give it into a list, nor does break it into chunks (by default). It will iterate over the iterable one element at a time, and send them each to a worker process.
Chunking is known to be crucial to performance for lightweight jobs. You can be confident that this result does not relate to tqdm
.
To add a progress bar without slowing the calculation down, set your own chunksize
argument:
with Pool(processes=cpu_count()) as pool:
results = list(tqdm(pool.imap(partial(dummy_task, size=size), steps, chunksize=10), total=len(steps)))
(There might be an optimal chunksize, I haven't tested.)
To quote the docs
:
The chunksize argument is the same as the one used by the map() method. For very long iterables using a large value for chunksize can make the job complete much faster than using the default value of 1.