I have written a C++ code that uses both OpenMP and OpenMPI. I want to use (let's say) 3 nodes (so size_Of_Cluster should be 3) and use OpenMP in each node to parallelize the for loop (There are 24 cores in a node). In essence I want MPI ranks be assigned to nodes. The Slurm script I have written is as follows. (I have tried many variations but could not come up with the "correct" one. I would be grateful if you could help me.)
#!/bin/bash
#SBATCH -N 3
#SBATCH -n 72
#SBATCH -p defq
#SBATCH -A akademik
#SBATCH -o %J.out
#SBATCH -e %J.err
#SBATCH --job-name=MIXED
module load slurm
module load shared
module load gcc
module load openmpi
export OMP_NUM_THREADS=24
mpirun -n 3 --bynode ./program
Using srun did not help.
The relevant lines are:
#SBATCH -N 3
#SBATCH -n 72
export OMP_NUM_THREADS=24
This means you have 72 MPI processes, and each creates 24 thread. For that to be efficient you probably need 24x72
cores. Which you don't have. You should specify:
#SBATCH -n 3
Then you will have 3 processes, with 24 threads per process.
You don't have to worry about the placement of the ranks on the nodes: that is done by the run time. You could for instance let each process print MPI_Get_processor_name
to confirm.