Edit: If you feel the question could be improved, please comment with suggestions since downvoting without comment is not particularly constructive.
I am trying to develop a clear mental model for using SLURM to request resources on HPC systems for hybrid MPI/OpenMP jobs. In thinking about it more, I realized there are some gaps in my understanding. I use SLURM terminology now where "CPU" means a single core. Is the below image a correct model for how a command like
srun -–ntasks=2 --cpus-per-task=4 --hint=nomultithread hybrid_ompmpi.bin
allocates resources from a simple cluster with a single compute node consisting of a single socket with 8 physical CPUs and each physical CPU has 2 threads (for a total of 8 * 2 = 16 logical CPUs)? Note that hybrid_ompmpi.bin
is just a dummy program name.
My understanding is that in the requested resources, since --hint=nomultithread
, only a single thread per CPU is utilized. Moreover, each MPI process will utilize 4 CPUs (though this seems off to me since I normally think of 1 MPI process per CPU).
SLURM calls (see [R1]) a physical/logical core (see [R7]) a CPU. The CPU (microprocessor chip) is called a socket in SLURM. Compute nodes can, of course, have multiple sockets.
Threaded memory model taken from [R3] and relation to processes taken from [R4].
From [R3], a process is an independent unit of computation that has ownership of a portion of memory and control over resources in user space.
The meaning of things like nodes, tasks, cpus per task, etc. taken partially from [R5].
The meaning of --hint=nomultihread
taken from [R6].
MPI processes and cores [R8].
[R1] : University Sigen: SLURM Terminology
[R2] : Figure 3 from What Every Computer Programmer Should Know About Memory
[R3] : Chapter 7 and Chapter 8 of Parallel and High Performance Computing
[R4] : SO: Does each process have it's own section of data, text , stack and heap in the memory?
[R5] : SO: HPC cluster: select the number of CPUs and threads in SLURM sbatch
[R6] : man srun
[R7] : SO: So what are logical cpu cores (as opposed to physical cpu cores)?
[R8] : SO: MPI cores or processors?
To summarize from the comments:
A hybrid MPI + OpenMP approach consists allocating tasks where each task consists of physical and/or logical cores. OpenMP then uses threads based on the physical and/or logical cores that are available to a given task. In the example in the question, each task gets 4 physical cores and OpenMP uses one thread on each physical core---as opposed to say 2 threads on 2 physical cores, which may occur depending on the operating system scheduling should the user not pass
--hint=nomultithread
Therefore, the mental model shown in the question is correct.