slurm

In SLURM, lscpu and slurmd -c are not matched. so resources are not usable


When I checked with the code "lscpu", it shows

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          45 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  Model name:             13th Gen Intel(R) Core(TM) i7-1360P
    CPU family:           6
    Model:                186
    Thread(s) per core:   1
    Core(s) per socket:   1
    Socket(s):            4
    Stepping:             2

But when I tried "slurmd -C", it shows

CPUs=1 Boards=1 SocketsPerBoard=1 CoresPerSocket=1 ThreadsPerCore=1

it shows different number of CPUs and in slurm.config file, when I tried to set CPUs=4, the node is not working with STATE INVAL. So I can only use one core even though I have 4 cores in my computer.

I tried openmpi, and it uses 4 cores. so I guess it is not problem of cores.

I checked if I have NUMA node with the code "lscpu | grep -i numa" it shows

NUMA node(s): 1
NUMA node0 CPU(s): 0 - 3

So it seems my computer does have NUMA node.

In hwloc 1.xx, this can be addressed by Ignore_NUMA. But hwloc 2.xx Ignore_NUMA is not working.

Is there another way to handle this problem?


Solution

  • It could be that Slurm was compiled with a version of the hwloc library that does not recognise that CPU which contains 4 high perf cores and 8 low power cores.

    What you can do is define CPUs=4, remove the other parameters Boards=1 SocketsPerBoard=1 CoresPerSocket=1 ThreadsPerCore=1, and set the config_overrides option in the SlurmdParameters configuration option. See also this other option that seems to be related to the type of CPU you havE.