parallel-processingmpiopenmpslurmhpc

Mental Model for Hybrid MPI/OpenMP with SLURM


Question

Edit: If you feel the question could be improved, please comment with suggestions since downvoting without comment is not particularly constructive.

I am trying to develop a clear mental model for using SLURM to request resources on HPC systems for hybrid MPI/OpenMP jobs. In thinking about it more, I realized there are some gaps in my understanding. I use SLURM terminology now where "CPU" means a single core. Is the below image a correct model for how a command like

srun -–ntasks=2 --cpus-per-task=4 --hint=nomultithread hybrid_ompmpi.bin 

allocates resources from a simple cluster with a single compute node consisting of a single socket with 8 physical CPUs and each physical CPU has 2 threads (for a total of 8 * 2 = 16 logical CPUs)? Note that hybrid_ompmpi.bin is just a dummy program name.

enter image description here

My understanding is that in the requested resources, since --hint=nomultithread, only a single thread per CPU is utilized. Moreover, each MPI process will utilize 4 CPUs (though this seems off to me since I normally think of 1 MPI process per CPU).

Context/Definitions

SLURM calls (see [R1]) a physical/logical core (see [R7]) a CPU. The CPU (microprocessor chip) is called a socket in SLURM. Compute nodes can, of course, have multiple sockets.

Threaded memory model taken from [R3] and relation to processes taken from [R4].

From [R3], a process is an independent unit of computation that has ownership of a portion of memory and control over resources in user space.

The meaning of things like nodes, tasks, cpus per task, etc. taken partially from [R5].

The meaning of --hint=nomultihread taken from [R6].

MPI processes and cores [R8].

References

[R1] : University Sigen: SLURM Terminology

[R2] : Figure 3 from What Every Computer Programmer Should Know About Memory

[R3] : Chapter 7 and Chapter 8 of Parallel and High Performance Computing

[R4] : SO: Does each process have it's own section of data, text , stack and heap in the memory?

[R5] : SO: HPC cluster: select the number of CPUs and threads in SLURM sbatch

[R6] : man srun

[R7] : SO: So what are logical cpu cores (as opposed to physical cpu cores)?

[R8] : SO: MPI cores or processors?


Solution

  • To summarize from the comments:

    A hybrid MPI + OpenMP approach consists allocating tasks where each task consists of physical and/or logical cores. OpenMP then uses threads based on the physical and/or logical cores that are available to a given task. In the example in the question, each task gets 4 physical cores and OpenMP uses one thread on each physical core---as opposed to say 2 threads on 2 physical cores, which may occur depending on the operating system scheduling should the user not pass

    --hint=nomultithread
    

    Therefore, the mental model shown in the question is correct.