Like many others, I've bought myself a new Ryzen CPU. I need to use Anaconda Python for my PhD (together with Tensorflow etc). Since Anaconda now comes pre-packaged with MKL which is slow on AMD CPUs, what is the best way to setup an Anaconda environment with openblas, and link numpy and scikit-learn, while keeping all other packages the same?
I've found the following posts which all points to installing some packages one way or another.
https://anaconda.org/anaconda/nomkl
This post from reddit has a much more thorough explanation of what's going on, but it's just a one liner in your terminal to trick MKL into thinking you are an Intel system since MKL does nasty things to non Intel devices: https://www.reddit.com/r/MachineLearning/comments/f2pbvz/discussion_workaround_for_mkl_on_amd/
WINDOWS:
opening a command prompt (CMD) with admin rights and typing in:
setx /M MKL_DEBUG_CPU_TYPE 5
Doing this will make the change permanent and available to ALL Programs using the MKL on your system until you delete the entry again from the variables.
LINUX:
Simply type in a terminal:
export MKL_DEBUG_CPU_TYPE=5
before running your script from the same instance of the terminal.
Permanent solution for Linux:
echo 'export MKL_DEBUG_CPU_TYPE=5' >> ~/.profile
will apply the setting profile-wide.
Some highlights since I figure you can click the link to read the entire thing if interested:
"However, the numerical lib that comes with many of your packages by default is the Intel MKL. The MKL runs notoriously slow on AMD CPUs for some operations. This is because the Intel MKL uses a discriminative CPU Dispatcher that does not use efficient codepath according to SIMD support by the CPU, but based on the result of a vendor string query. If the CPU is from AMD, the MKL does not use SSE3-SSE4 or AVX1/2 extensions but falls back to SSE no matter whether the AMD CPU supports more efficient SIMD extensions like AVX2 or not.
The method provided here enforces AVX2 support by the MKL, independent of the vendor string result and takes less than a minute to apply. If you have an AMD CPU that is based on the Zen/Zen+/Zen2 µArch Ryzen/Threadripper, this will boost your performance tremendously."