I am using openblas on the AMD Epyc 'zen' cpus. To install it I simply did make TARGET=ZEN
I have a system with 2x 7601 (i.e. 2 x32 cores) and I can run across all cores and get an ok GFLOPS number from DGEMM using export OMP_NUM_THREADS=64
But I am now trying to pin it to a smaller set of cores, just 2 cores, with 1 core on 1 socket and the second core on the other socket. So I set
1) export OMP_NUM_THREADS=2 export GOMP_CPU_AFFINITY="0 32" but it always dumps the 2 threads on to the first 2 cores.
2) I have logged out, logged back in and tried export OMP_NUM_THREADS=2 numactl -C 0,31 ./mt-dgemm but again it dumps them onto cores 0 and 1
3) I have logged out, logged back in and tried export OMP_NUM_THREADS=2 taskset-c 0,31 ./mt-dgemm but again it dumps them onto cores 0 and 1
But if I try just a single core, OMP_NUM_THREADS=1 and then do a taskset or numactl and change the core ID to 4 or 8 or 52 or whatever it then successfully pins that single thread to the core that I requested it to.
Does anyone know what I am doing wrong when I try to pin 2 or more cores to specific CPU ids?
Many thanks!
(I am using CentOS 7.4 with GCC 7.2)
To answer my own question, when building openblas use the NO_AFFINITY=1 flag to disable the automatic affinity witnessed above. Thus;
make TARGET=zen NO_AFFINITY=1