I'm running a script on a Slurm cluster that could benefit from parallel processing, so I'm trying to implement MPI. However, it doesn't seem to allow me to run processes on multiple nodes. I don't know if this is normally done automatically, but whenever I set --nodes=2 in the batch file for submission, I get the error message:
"Warning: can't run 1 processes on 2 nodes, setting nnodes to 1."
I've been trying to get it to work with a simple Hello World script, but still run into the above error. I added --oversubscribe to the options when I run the MPI script, but still get this error.
#SBATCH --job-name=a_test
#SBATCH --mail-type=ALL
#SBATCH --ntasks=1
#SBATCH --cpu-freq=high
#SBATCH --nodes=2
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=1gb
#SBATCH --mem-bind=verbose,local
#SBATCH --time=01:00:00
#SBATCH --output=out_%x.log
module load python/3.6.2
mpirun -np 4 --oversubscribe python par_PyScript2.py
I still get the expected output, but only after the error message:
"Warning: can't run 1 process on 2 nodes, setting nnodes to 1."
I'm worried that without being able to run on multiple nodes, my actual script will be a lot slower.
The reason for the warning is this line:
#SBATCH --ntasks=1
where you're specifying that you're going to run only 1 mpi process, just before you request 2 nodes.
--ntasks
sets the number of processes to run/ranks to use in your case. You then overwrite it with an equivalent -n
which is why you're seeing the result.
For your reference, this is the script I run on my system,
#!/bin/bash
#SBATCH -C knl
#SBATCH -q regular
#SBATCH -t 00:10:00
#SBATCH --nodes=2
module load python3
START_TIME=$SECONDS
srun -n 4 python mpi_py.py >& py_${SLURM_JOB_ID}.log
ELAPSED_TIME=$(($SECONDS - $START_TIME))
echo $ELAPSED_TIME
Performance notes:
-c
and cpu_bind=
(more here).