[SOLVED] slurm how to get the cpu_ids from within an sbatch job?

slurm how to get the cpu_ids from within an sbatch job?

What's the best way to get the valid cpu ids from within a running job? My idea is to do an allocation --> wrap a docker command with the limits of the allocation --> run nvidia-docker on an remote gpu server.

To limit the docker to the allocation I need to tell it the cpu_ids.

My job submission will look like:

sbatch -o test.txt -c2 -n 10 --mem=10GB --wrap="job that needs the cpu_ids"

Solution

Another way (from inside a node) is to parse the SLURM_CPU_BIND_LIST bitmask:

python -c '
import os
s = os.environ["SLURM_CPU_BIND_LIST"]
v = int(s.strip(), base=16)
idxs = [i for i, b in enumerate(reversed(f"{v:0b}")) if int(b)]
print(",".join(f"{x}" for x in idxs))
'

Output (zero-indexed CPU_IDs):

76,77,80,81

Note that these are Slurm's CPU_IDs, which might not be the same as the system's. (But the naive mapping system_cpu_id = slurm_cpu_id might hold.)