slurm

slurm how to get the cpu_ids from within an sbatch job?


What's the best way to get the valid cpu ids from within a running job? My idea is to do an allocation --> wrap a docker command with the limits of the allocation --> run nvidia-docker on an remote gpu server.

To limit the docker to the allocation I need to tell it the cpu_ids.

My job submission will look like:

sbatch -o test.txt -c2 -n 10 --mem=10GB --wrap="job that needs the cpu_ids"

Solution

  • Another way (from inside a node) is to parse the SLURM_CPU_BIND_LIST bitmask:

    python -c '
    import os
    s = os.environ["SLURM_CPU_BIND_LIST"]
    v = int(s.strip(), base=16)
    idxs = [i for i, b in enumerate(reversed(f"{v:0b}")) if int(b)]
    print(",".join(f"{x}" for x in idxs))
    '
    

    Output (zero-indexed CPU_IDs):

    76,77,80,81
    

    Note that these are Slurm's CPU_IDs, which might not be the same as the system's. (But the naive mapping system_cpu_id = slurm_cpu_id might hold.)