slurmsbatch

several mpiruns in parallel on several nodes


I want to run two programs using mpi in parallel in the same job script. In SLURM I would usually just write a script for sbatch (shortened):

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
mpirun program1 &
mpirun program2

This works fine. The two programs will internally communicate with each other and coordinate execution. So overcommiting is fine. Moreover, they require each other and cannot run as stand-alone in the present configuration.

However, if I want to extend this to several nodes, e.g.

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2

SLURM does not start the first job in the background. Instead, it starts in the foreground, fails because it does not find the second step and the second then also fails -- because it does not find the first.

I am a bit at a loss here because that is the suggested solution (e.g. Run a "monitor" task alongside mpi task in SLURM) to similar problems and I do not see a reason why this should not work over several nodes. Indeed it does, for instance on PBS.


Solution

  • With slurm the main issue is that the user should use srun instead of mpirun, i.e.:

    #SBATCH --nodes=1
    #SBATCH --ntasks-per-node=4
    srun --overlap program1 &
    srun --overlap program2
    wait
    

    When running on multiple nodes, using srun is crucial, --overlap allows the job steps to share allocated resources, and wait ensures that all steps are finished.