c++multithreadingopenblas

C++ how to set environment variable so OpenBLAS runs multithreaded


The author recommends the following: https://github.com/xianyi/OpenBLAS

Setting the number of threads using environment variables

Environment variables are used to specify a maximum number of threads. For example,

export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4

The priorities are OPENBLAS_NUM_THREADS > GOTO_NUM_THREADS > OMP_NUM_THREADS.

If you compile this library with USE_OPENMP=1, you should set the OMP_NUM_THREADS environment variable; OpenBLAS ignores OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS when compiled with USE_OPENMP=1.

When I use "export OPENBLAS_NUM_THREADS=16" in my main.cpp, I get an error about templates.

So, I changed my CMakeList.txt file to include:

set($ENV{OPENBLAS_NUM_THREADS} 16)

This seemed to have no effect on the threading of my application. I only see 1 CPU core at 100%.


Solution

  • When I use "export OPENBLAS_NUM_THREADS=16" in my main.cpp, I get an error about templates.

    OPENBLAS_NUM_THREADS is a runtime defined variable so it should not impact the build of an application unless the build scripts explicitly use this variable which is very unusual and a very bad idea (since the compile-time environment can be different from the run-time one).

    Note that export OPENBLAS_NUM_THREADS=16 is a bash command and not something to put in a C++ file. Its purpose is to set the environment variable OPENBLAS_NUM_THREADS so it can be read at runtime by OpenBLAS when your application call a BLAS function. You should do something like:

    # Build part
    cmake  # with the correct parameters
    make
    
    # Running part
    export OPENBLAS_NUM_THREADS=4
    ./your_application  # with the correct parameters
    
    # Alternative solution:
    # OPENBLAS_NUM_THREADS=4 ./your_application
    

    So, I changed my CMakeList.txt file to include:

    This should have not effect indeed because the variable should not be used at compile time.

    I only see 1 CPU core at 100%

    Note that setting OPENBLAS_NUM_THREADS may not be enough to use multiple threads in practice. If your matrices are small, then consider reading this very-recent post about how OpenBLAS works with multiple threads.