
How do I run many Singularity/Apptainer containers from one Python script, using multiple CPUs and nodes?

Problem statement

I have a Python program that needs to launch a number of Singularity containers in parallel.

Is it possible to do this, exploiting all of the available hardware, using only built-in libraries (subprocessing, concurrent.futures, etc)?

The 'host' script runs on 1 CPU. It is launched by SLURM. The 'host' needs to launch the containers, wait for them to complete, do some analysis, repeat.

For example, if I have 40 containers each needing 2 CPUs, and two nodes each with 76 CPUs, then there should be something like:

Node 1 (76 CPUs) Node 2 (76 CPUs)
Host script (1 CPU) 3 containers (6 CPUs)
37 containers (74 CPUs) 70 spare CPUs
1 spare CPU


Singularity recipe (stress.def)

We use stress to fully utilise a given number of CPUs:

Bootstrap: docker
From: ubuntu:16.04

apt update -y
apt install -y stress 

    echo $(uname -n)
    stress "$@"

Build with singularity build stress.simg stress.def.

Python host script (

Spin up 40 containers, each running the stress image with 2 CPUs for 10s:

from subprocess import Popen

n_processes = 40
cpus_per_process = 2
stress_time = 10

command = [
processes = [Popen(command) for i in range(n_processes)]

for p in processes:

SLURM script

#SBATCH -J stress
#SBATCH -A myacc
#SBATCH -p mypart
#SBATCH --output=%x_%j.out
#SBATCH --nodes=2
#SBATCH --ntasks=40
#SBATCH --cpus-per-task=2
#SBATCH --time=24:00:00



The above only runs on one of the two nodes. Total execution time is around 20s, and the Singularity containers are run sequentially - the first 38 are run, and then the last two.

As such, it does not have the desired effect.


  • Turns out my question was just from a misunderstanding of what should be handled by Singularity and what should be handled by SLURM.

    My mistake was thinking that Singularity could see and utilise other nodes; in reality, it can only see resources available on the current node.


    1. Allocate all the resources that the overall job will need in the SLURM script, treating each container as a new task. So in the above example, that means setting ntasks=40 and cpus-per-task=2.
    2. Launch the subprocesses with srun. This will allow SLURM to allocate resources to the containers from the pool that have already been allotted to this particular run.


    from subprocess import Popen
    n_processes = 40
    cpus_per_process = 2
    stress_time = 10
    command = [
        "srun", # <--------- MODIFICATION
    processes = [Popen(srun_command) for i in range(n_processes)]
    for p in processes: