When compiling my program using Caffe2 I get this warnings:
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Since I do want to get multi-threading support for Caffe2, I've searched what to do. I've found that Caffe2 has to be re-compiled setting some arguments while creating the cmake
or in the CMakeLists
.
Since I already had installed pytorch
in a conda
env, I have first uninstalled Caffe2 with:
pip uninstall -y caffe2
Then I've followed the instructions from the Caffe2 docs, to build it from sources.
I first installed the dependencies as indicated. Then I downloaded pytorch
inside my conda
env with:
git clone https://github.com/pytorch/pytorch.git && cd pytorch
git submodule update --init --recursive
At this time I think is the moment to change the pytorch\caffe2\CMakeLists
file just downloaded. I have read that in order to enable the multi-threading support is sufficient to enable the option USE_NATIVE_ARCH
inside this CMakeLists
, however I'm not able to find such option where I'm looking. Maybe I'm doing something wrong. Any thoughts? Thanks.
Here some details about my platform:
python
version is 3.8.5
UPDATE:
To answer Nega this is what I've got:
python3 -c 'import torch; print(torch.__config__.parallel_info())'
ATen/Parallel:
at::get_num_threads() : 1
at::get_num_interop_threads() : 4
OpenMP not found
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
mkl_get_max_threads() : 4
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
std::thread::hardware_concurrency() : 8
Environment variables:
OMP_NUM_THREADS : [not set]
MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP
UPDATE 2:
It turned out that the Clang that comes with XCode doesn't support OpenMP. The gcc
that I was using was just a symlink to Clang. In fact after running gcc --version
I got:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin20.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
I installed from Homebrew gcc-10
and set the alias like this alias gcc='gcc-10'
. In fact now with gcc --version
this is what I get:
gcc-10 (Homebrew GCC 10.2.0_4) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I've also tried a simple Hello World for OpenMP using 8 threads and everything seems to be working. However after re-running the command:
python3 -c 'import torch; print(torch.__config__.parallel_info())'
I get the same outcome. Any thoughts?
AVX, AVX2, and FMA are CPU instruction sets and are not related to multi-threading. If the pip package for pytorch/caffe2 used these instructions on a CPU that didn't support them, the software wouldnt work. Pytorch, installed via pip
comes with multi-threading enabled though. You can confirm this with torch.__config__.parallel_info()
❯ python3 -c 'import torch; print(torch.__config__.parallel_info())'
ATen/Parallel:
at::get_num_threads() : 6
at::get_num_interop_threads() : 6
OpenMP 201107 (a.k.a. OpenMP 3.1)
omp_get_max_threads() : 6
Intel(R) Math Kernel Library Version 2020.0.1 Product Build 20200208 for Intel(R) 64 architecture applications
mkl_get_max_threads() : 6
Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
std::thread::hardware_concurrency() : 12
Environment variables:
OMP_NUM_THREADS : [not set]
MKL_NUM_THREADS : [not set]
ATen parallel backend: OpenMP
That being said, if you still want to continue building pytorch and caffe2 from source, the flag your looking for, USE_NATIVE is in pytorch/CMakeLists.txt
, one level up from caffe2
. Edit that file and change USE_NATIVE to ON. Then continue building pytorch with python3 setup.py build
. Note that the flag USE_NATIVE doesn't do what you think it does. It only allows the building of MKL-DNN with CPU native optimization flags. It does not trickle down to caffe2
(except where caffe2
use MKL-DNN obviously.)