pythonmultithreadingparallel-processingmultiprocessingjoblib

Tracking progress of joblib.Parallel execution


Is there a simple way to track the overall progress of a joblib.Parallel execution?

I have a long-running execution composed of thousands of jobs, which I want to track and record in a database. However, to do that, whenever Parallel finishes a task, I need it to execute a callback, reporting how many remaining jobs are left.

I've accomplished a similar task before with Python's stdlib multiprocessing.Pool, by launching a thread that records the number of pending jobs in Pool's job list.

Looking at the code, Parallel inherits Pool, so I thought I could pull off the same trick, but it doesn't seem to use these that list, and I haven't been able to figure out how else to "read" it's internal status any other way.


Solution

  • Why can't you simply use tqdm? The following worked for me

    from joblib import Parallel, delayed
    from datetime import datetime
    from tqdm import tqdm
    
    def myfun(x):
        return x**2
    
    results = Parallel(n_jobs=8)(delayed(myfun)(i) for i in tqdm(range(1000))
    100%|██████████| 1000/1000 [00:00<00:00, 10563.37it/s]