python-3.x compilation windows-10 clang-cl pythran

Setting up Pythran for compiling on Windows with clang-cl.exe and OpenMP working: need a way to pass compiler arguments

I'm using Pythran to compile Python code into C/C++ with OpenMP support on Windows. Now the documentation isn't great for Windows - it states: "Windows support is on going and only targets Python 3.5+ with either Visual Studio 2017 or, better, clang-cl. Note that using clang-cl.exe is the default setting. It can be changed through the CXX and CC environment variables."

From playing around I found you MUST use clang-cl.exe or the code won't compile (MSVC doesn't like it).

So the preferred compiler is clang-cl.exe which is the "drop-in" replacement for cl.exe so Clang 12 was installed from Visual Studio 2019 setup by selecting "C++ Clang tools for Windows," and now I have C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\x64\bin\clang-cl.exe as well as the LLVM linker lld-link.exe - since clang-cl.exe is the default I don't need to change any setup files, I just run vcvarsall.bat before Pythran so the compiler directory is in the path. (I noticed later to get lld-link.exe used some hacking of distutils _msvccompiler.py is required, switch link.exe to lld-link.exe and also comment out the '/LTCG' flag since Clang doesn't have that option, then it works... But still no OpenMP...

I compiled one of the examples with a virtual environment in Anaconda which had the pip installed NumPy and SciPy libraries (OpenBLAS backend) since MKL support is barely documented. It needed the pythran-openblas package so I pip installed that as well, and it compiled fine with clang-cl and I could import it no problem. I found that [Python]\Lib\site-packages\pythran\pythran-win32.cfg has an option to pass cflags where I can type the correct compiler arguments like: -Xclang -fopenmp -march=ivybridge and when running pythran [script.py], all those flags are passed the correct way (using the defaults isn't correct). BUT... this example from the docs is still not running in parallel.

I found on Stack Exchange: clang-cl -cc1 --help would output all the arguments clang can handle. Under openmp it states: -fopenmp Parse OpenMP pragmas and generate parallel code. So my guess here is that the example given in the Pythran documentation has no OpenMP pragmas to make parallel. Now why would they do that? No idea, as they show an example of it being made incredibly faster via OpenMP, but I can't reproduce it on Windows. And I have 6 cores / 12 virtual so I should see a speedup.

Anyone else have another OpenMP example I can try this out on??? Or have solved this mystery of using OpenMP another way?

Much appreciated!

Solution

The Pythran project maintainer got back to me after I emailed him directly. It seems that OpenMP is only supported via explicit #omp statements. So some time ago when they wrote the docs it would infer parallel routines, but not now. So to convert the example to OpenMP, a few changes are required:

#pythran export arc_distance(float[], float[], float[], float[])
import numpy as np
def arc_distance(theta_1, phi_1, theta_2, phi_2):
"""
Calculates the pairwise arc distance
between all points in vector a and b.
"""
    size = theta_1.size
    distance_matrix=np.empty_like(theta_1)
    #omp parallel for
    for i in range(size):
        temp = (np.sin((theta_2[i]-theta_1[i])/2)**2 + np.cos(theta_1[i])*np.cos(theta_2[i]) * np.sin((phi_2[i]-phi_1[i])/2)**2)
        distance_matrix[i] = 2 * np.arctan2(np.sqrt(temp), np.sqrt(1-temp))
    return distance_matrix

BUT... there are other compiler arguments not documented that need to be passed to get an OpenBLAS-backed OpenMP module working, which took me HOURS to figure out. Here they are:

Pythran OpenBLAS Windows 10 Settings:

Find the file [Python]\Lib\site-packages\pythran\pythran-win32.cfg

Add to library_dirs: 'C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\x64\lib'

Add to cflags: -Xclang -fopenmp

Add to ldflags: \libiomp5md.lib

Set blas to: blas=pythran-openblas

Then it should compile fine with a: pythran -v arc_distance.py - adding the -v flag is very helpful for finding issues (verbose compiler mode), but not needed.

Pythran Intel MKL Windows 10 Settings (Anaconda3 default libraries): I also decided why not try to make this work on default Anaconda3 where NumPy and SciPy etc. are all compiled with MKL? My company uses Anaconda3, so everyone has Intel MKL already. And like the OpenBLAS settings, the MKL settings for Windows aren't documented either. So I figured it out:

Find the file [Python]\Lib\site-packages\pythran\pythran-win32.cfg, (most likely it's at C:\Users[username]\Anaconda3)

Add to include_dirs='C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\Llvm\x64\lib', '[Python]\Library\include'

Add to cflags: -Xclang -fopenmp

Add to ldflags: \libomp.lib

Set blas to: blas=mkl

Now you'll notice some strange things above compared to the OpenBLAS settings. The library path isn't populated, instead it has to be in the include path (don't ask why, I don't know). Also the OpenMP library is different. Again, I don't know why the one that works with OpenBLAS refuses to work with Intel MKL. But anyhow, that will give you Pythran with OpenMP on an Intel MKL based system.