joblibipython-parallel

joblib Parallel running out of memory


I have something like this outputs = Parallel(n_jobs=12, verbose=10)(delayed(_process_article)(article, config) for article in data)

Case 1: Run on ubuntu with 80 cores:

CPU(s):                80
Thread(s) per core:    2
Core(s) per socket:    20
Socket(s):             2

There are a total of 90,000 tasks. At around 67k it fails and is terminated. joblib.externals.loky.process_executor.BrokenProcessPool: A process in the executor was terminated abruptly, the pool is not usable anymore. When I monitor the top at 67k I see a sharp fall in the memory

top - 11:40:25 up 2 days, 18:35,  4 users,  load average: 7.09, 7.56, 7.13
Tasks:  32 total,   3 running,  29 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.6 us,  2.6 sy,  0.0 ni, 89.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 33554432 total,       40 free, 33520996 used,    33396 buff/cache
KiB Swap:        0 total,        0 free,        0 used.       40 avail Mem

Case 2: Mac with 8 cores

hw.physicalcpu: 4
hw.logicalcpu: 8

But on the mac it is much much slower .. And surprisingly it does not get killed at 67k..

Additionally, I reduced the parallelism (in case 1) to 2,4 and it still fails :( Why is this happening? Has anyone faced this issue before and has a fix?

Note: when I run for 50,000 tasks it runs well and does not give any problems.

Thank you!


Solution

  • Got a machine with an increased memory of 128GB and that solved the problem!