queueslurmsupercomputers

Slurm: how to use all cores available to the node?


I'm working with a large computing cluster with a SLURM workload manager that has four different subsections: we'll call them C1, C2, C3, and C4. Nodes in C1 and C2 have 28 cores, whereas those in C3 and C4 have 40 and 52 cores, respectively. I would like to be able to use all cores per node, but when I submit a job to the queue, I have no idea to which subsection it will be assigned and therefore don't know how many cores will be available. Is there a variable in SLURM to plug into --ntasks-per-node that will tell it to use all available cores on the node?


Solution

  • If you request a full node, with --nodes=1 --exclusive, you will get access to all CPUs (which you can check with cat /proc/$$/status|grep Cpus). The number of CPUs available will be given by the SLURM_JOB_CPUS_PER_NODE environment variable.

    But the number of tasks will be one so you might have to adjust how you start your program and set the number of CPUs explicitly, for instance with an OpenMPI program a.out:

    mpirun -np $SLURM_JOB_CPUS_PER_NODE ./a.out