pythonmultiprocessingcpucpu-coreshtop

Python multiprocessing Pool.map uses all cores instead of the specified number


I am using the multiprocessing module of Python and more precisely the Pool class and its map method to run a function (called evaluate) in parallel onto a list of Python objects (called object_list).

The problem occurs with a computer that has 2CPUs of 20 cores each:

>>> import multiprocessing as mp
>>> mp.cpu_count()
40

Each run of the function is quite long and also independent, so I chose multiprocessing module to run my function on my list instead of doing it in serial with a for loop for instance.

So basically I am using the Pool class, the map method and specifies the processes argument to be 8. The syntax is then:

# import the module
import multiprocessing as mp

# creating a Pool instance that should run 8 cores
pool = mp.Pool(processes=8)

# running my evaluations in parallel using Pool.map
pool.map(evaluate, object_list)

Unfortunately, my code is too long, and I have not been able to provide a minimum working example. All I can do is describe what syntax I use (like above). So here is my problem.

I run my script on 2 computers. The first one is composed of a CPU of 12 cores. The second one is composed of 2 CPUs of 20 cores each.

  1. When I run my script on the first computer, on 8 cores (processes=8), the terminal command htop shows, as expected, 8 lines for my script (called EA_launch.py): multiprocessing running as expected

  2. Now, for the second computer, all cores (40) are running, even if it is the exact same script, and only 8 cores are specified. As we can see, 8 processes are written in white, and the rest are in green. I don't really understand the green ones.

enter image description here

Note that in the screenshot above, the lines with green python EA_launch.py keeps going over 200 lines. The htop commands shows also that all 40 cores are running (instead of 8). The evaluation is incredibly slowed. I understand that running a code in parallel does not automatically mean better performances. But in my case, with the first computer, running in parallel makes the code much faster (as expected), but with the second computer, this "strange" behavior with the cores make it incredibly slower.

Can anyone enlighten me on that matter ? It seems, the fact that the second computer has 2 CPUs messes with multiprocessing module.


Solution

  • Thank you a lot @CharlesDuffy ! That post: limit number of threads in numpy? (and the relevant comments) has solved my problem.

    Adding those lines:

    import os
    os.environ["MKL_NUM_THREADS"] = "1" 
    os.environ["NUMEXPR_NUM_THREADS"] = "1" 
    os.environ["OMP_NUM_THREADS"] = "1" 
    

    BEFORE IMPORTING numpy, made my code work well on both systems !