The author recommends the following: https://github.com/xianyi/OpenBLAS
Setting the number of threads using environment variables
Environment variables are used to specify a maximum number of threads. For example,
export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4
The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.
If you compile this library with USE_OPENMP=1, you should set the OMP_NUM_THREADS environment variable; OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS when compiled with USE_OPENMP=1.
When I use "export OPENBLAS_NUM_THREADS=16" in my main.cpp, I get an error about templates.
So, I changed my CMakeList.txt file to include:
set($ENV{OPENBLAS_NUM_THREADS} 16)
This seemed to have no effect on the threading of my application. I only see 1 CPU core at 100%.
When I use "export OPENBLAS_NUM_THREADS=16" in my main.cpp, I get an error about templates.
OPENBLAS_NUM_THREADS
is a runtime defined variable so it should not impact the build of an application unless the build scripts explicitly use this variable which is very unusual and a very bad idea (since the compile-time environment can be different from the run-time one).
Note that export OPENBLAS_NUM_THREADS=16
is a bash command and not something to put in a C++ file. Its purpose is to set the environment variable OPENBLAS_NUM_THREADS
so it can be read at runtime by OpenBLAS when your application call a BLAS function. You should do something like:
# Build part
cmake # with the correct parameters
make
# Running part
export OPENBLAS_NUM_THREADS=4
./your_application # with the correct parameters
# Alternative solution:
# OPENBLAS_NUM_THREADS=4 ./your_application
So, I changed my CMakeList.txt file to include:
This should have not effect indeed because the variable should not be used at compile time.
I only see 1 CPU core at 100%
Note that setting OPENBLAS_NUM_THREADS
may not be enough to use multiple threads in practice. If your matrices are small, then consider reading this very-recent post about how OpenBLAS works with multiple threads.