c++inteliccxeon-phiintel-oneapi

does mkl_vml_serv_threader in the gprofile means MKL is not running sequentially


We're running an application that's in the process of being MKL BLAS enhaced. We've been told not to hyperthread.

In order for multithreaded (so-called parallel?) version to not be considered during compilation, i.e. to disable hyperthreading but only wanting MKL sequential vectorization, we removed the threaded library from the FindMKL Cmake file. The compiler was icc 2019.

In order to disable multithreading at runtime we launched the tasks in slurm setting --threads-per-core=1 in the slurmfile.

Yet we are not sure how to double-check that MKL is only running sequentially, so we collected a (summed over 4 cores, single cluster node) profile w/ gprof.

The following functions appear on the flat profile albeit consuming less than 0.3% each. Are they evidence to support the idea that MKL is hyperthreading, i.e. "not running in sequential mode"?

mkl_vml_serv_threader_d_2iI_1oI

mkl_vml_serv_threader_d_1i_1o

mkl_vml_serv_threader_d_1iI_1oI

mkl_vml_serv_threader_d_2i_1o

Solution

  • By default, Intel® oneAPI Math Kernel Library uses the number of OpenMP threads equal to the number of physical cores on the system and it runs on all the available physical cores until and unless we mention some options which are mentioned below.

    Intel compilers like icc(latest) have a compiler option -qmkl=[lib] and the lib indicates which library files should be linked and the values are as follows.

    parallel:

    Tells the compiler to link using the threaded libraries in oneMKL. This is the default if the option is specified with no lib.

    sequential:

    Tells the compiler to link using the sequential libraries in oneMKL.

    cluster:

    Tells the compiler to link using the cluster-specific libraries and the sequential libraries in oneMKL.

    So if you want to run it sequentially, use -qmkl=sequential. Since you are using icc 2019, check icc --help and search for the options (i guess it is -mkl not -qmkl).

    Additionally, you can also make use of the link line advisor tool (https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html?wapkw=link%20line%20advisor#gs.0myxfc) which helps you to see the required libraries specific to your use case.

    As mentioned in the comments, using MKL_VERBOSE=1 helps to get details about version of MKL, parameters to the mkl calls, time taken by the function, also the NThr which indicates number of threads and some other details as well you can refer the given link. eg: MKL_VERBOSE=1 ./a.out