I have an MPI application ( lets say -np 6
) in which I know ahead of time that MPI ranks 0, 2,and 3
are very light computationally compared to ranks 1, 4, and 5
and I want to conserve resource by pinning ranks 0, 2, and 3
to the same physical processing unit. Then pin ranks 1, 4, and 5
each to individual physical processing units of their own.
I know there are many flavors of MPI out there and syntax varies, however I cannot find anything out there that actually dictates the location of individual ranks, instead of just specifying a uniform 2 ppn or something to that effect. But I have to image this is possible, I am just not sure of where it falls, pinning? binding? mapping? etc.
Thanks for the help!
Open MPI supports what it calls rankfiles that specify the mapping of each rank to host and processing element on that host. You can see more in the man page for mpiexec
(link is to documentation for v2.1 that comes with, e.g., Ubuntu 18.04 LTS, but is essentially the same in newer versions too), but assuming you run everything on a single host with at least 4 CPU cores, the rankfile will look something like:
rank 0=hostname slot=0
rank 1=hostname slot=1
rank 2=hostname slot=0
rank 3=hostname slot=0
rank 4=hostname slot=2
rank 5=hostname slot=3
where hostname
is the host name, possibly localhost
.
Here is an example:
First, a small utility script show_affinity
that displays the CPU affinity of the current MPI rank:
#!/bin/bash
echo "$OMPI_COMM_WORLD_RANK: $(grep Cpus_allowed_list /proc/self/status)"
Second, a sample rankfile
:
rank 0=localhost slot=0
rank 1=localhost slot=1
rank 2=localhost slot=0
rank 3=localhost slot=0
rank 4=localhost slot=2
rank 5=localhost slot=3
MPI launch of show_affinity
using that rankfile:
$ mpiexec -H localhost -rf rankfile ./show_affinity
0: Cpus_allowed_list: 0-1
1: Cpus_allowed_list: 2-3
2: Cpus_allowed_list: 0-1
3: Cpus_allowed_list: 0-1
4: Cpus_allowed_list: 4-5
5: Cpus_allowed_list: 6-7
The CPU has hyperthreading enabled, so each rank gets bound to both hardware threads.