python bash multiprocessing slurm sbatch

How to use properly Slurm sbatch and python Multiprocessing

I wan to run a code using multiprocessing in a server with slurm architecture. I want to limit the number of cpus available and that the code creates a child process for every of them.

My code could be simplified in this way:

def Func(ins) : 
  ###
  things...
  ###
return var

if __name__ == '__main__' :
  from multiprocessing import Pool
  from multiprocessing import active_children
  from multiprocessing import cpu_count

  p = Pool()
  print("active cpus = ", cpu_count())
  print("open process = ", p._processes)
  print("active_children = ", len(active_children()))
  results = p.map(Func, range(2000))
  p.close()

  exit()

ruled by this bash script:

#!/bin/bash

#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48
#SBATCH --mem=40000 # Memory per node (in MB).

module load python 
conda activate myenv
python3 test.py

echo 'done!'

What I get is that the code runs every time on the maximum number of cpus (272), whatever combination of parameters I try:

active cpus =  272
open process =  272
active_children =  272
done!

I launch the job with the command

sbatch job.sh

What I'm doing wrong?

Solution

Your Python code is responsible for creating the wanted number of processes based on the Slurm allocation.

If you want, as is often the case, to have one process per allocated CPU, your code should look like this:

if __name__ == '__main__' :
  from multiprocessing import Pool
  from multiprocessing import active_children
  from multiprocessing import cpu_count

  ncpus = int(os.environ['SLURM_CPUS_PER_TASK'])
  p = Pool(ncpus)

  print("active cpus = ", cpu_count())
  print("open process = ", p._processes)
  print("active_children = ", len(active_children()))
  results = p.map(Func, range(2000))
  p.close()

  exit()

The SLURM_CPUS_PER_TASK environment variable will hold the value you specify in the #SBATCH --cpus-per-task=48 line in the submission script.