pythonnumpyanacondapybind11arpack

Custom C++ library dependent on ARPACK wrapped in pybind11 segfaults when NumPy is also imported


I am creating a custom library (written in C++) that does some numerical stuff with ARPACK-NG. The function is wrapped in pybind11 to provide access to the method from Python in a package. I observe strange behavior.

Overview of the issue

When NumPy is imported before my method is called, a segfault occurs.

import numpy as np
from mylib import mymethod

mymethod() # Segfault

The same also results if the import order changes.

from mylib import mymethod
import numpy as np

mymethod() # Segfault

When NumPy is imported after my method is called, everything works fine.

from mylib import mymethod
mymethod() # Works fine

import numpy as np

# Further calls to NumPy or my library works also.

GDB Trace

The backtrace looks like this.

#0  0x00007fffec59d2ef in mkl_blas.cdotc () from /home/myname/.conda/envs/mylib/lib/./libmkl_intel_lp64.so.1
#1  0x00007ffff7281974 in cneupd_ () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#2  0x00007ffff72af228 in cneupd_c () from /home/myname/.conda/envs/mylib/lib/libarpack.so.2
#3  0x00007ffff76b25cd in void complex_symmetric_runner<float>(double const&) ()
   from /home/myname/Documents/mylib/build/lib.linux-x86_64-3.9/mylib/libmylib.so
#4  0x00007ffff76b102b in mymethod() ()

Replicable example

The test code is essentially the same as the C++ example provided by ARPACK-NG, with the main method replaced by mymethod(). The minimal binding code is

#include<pybind11/pybind11.h>

// ARPACK-NG C++ example code goes here. The main method is replaced with mymethod so it can be called from pybind11.

void mymethod(){
   // ...Contents of the main() function in the example ARPACK-NG code...
}

PYBIND11_MODULE(mylib, m){
    m.def("mymethod", mymethod);
}

My guess at what the issue is.

I believe there is some issue with NumPy and MKL's initialization, similar to this issue. From what I've gathered, NumPy links to MKL through mkl_rt dynamically through libmkl_rt.so, as shown in the NumPy config below.

import numpy
numpy.show_config()
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/myname/.conda/envs/mylib/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/myname/.conda/envs/mylib/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/myname/.conda/envs/mylib/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/myname/.conda/envs/mylib/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/myname/.conda/envs/mylib/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/myname/.conda/envs/mylib/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/myname/.conda/envs/mylib/include']
Supported SIMD extensions in this NumPy install:
    baseline = SSE,SSE2,SSE3
    found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
    not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CNL

My library links to it through ARPACK-NG's shared library dynamically, and per the GDB trace, ends up linking to libmkl_intel_lp64.so. This is confusing, however, because when I type ldd /home/myname/.conda/envs/mylib/lib/libarpack.so.2 there is no mention of libarpack.so linking to MKL.

linux-vdso.so.1 (0x0000697945afd000)
libblas.so.3 => /home/myname/.conda/envs/mylib/lib/./libblas.so.3 (0x0000697945200000)
libgfortran.so.4 => /home/myname/.conda/envs/mylib/lib/./libgfortran.so.4 (0x0000697945972000)
libm.so.6 => /usr/lib/libm.so.6 (0x00006979450db000)
libc.so.6 => /usr/lib/libc.so.6 (0x0000697944ed1000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x000069794596d000)
libquadmath.so.0 => /home/myname/.conda/envs/mylib/lib/./libquadmath.so.0 (0x0000697944e97000)
libgcc_s.so.1 => /home/myname/.conda/envs/mylib/lib/./libgcc_s.so.1 (0x0000697944e82000)
/usr/lib64/ld-linux-x86-64.so.2 (0x0000697945aff000)

If I were to guess what is happening, NumPy is checking to see if some BLAS library is loaded in when it is imported. If my code is called first, libblas.so is loaded in by it and NumPy happens to use that. However, if NumPy is imported first, it loads in MKL for the BLAS library, which somehow interferes with libarpack.so.

Is my assessment correct and is there a way to solve this problem?


Solution

  • As far as I can tell, what I believed to be the root cause of the issue is correct. I have arrived at a solution that, while not completely satisfactory, nonetheless solves the problem: instantiate the Anaconda environment with the nomkl package (i.e. conda create -n mylib_nomkl nomkl python=3.9 numpy). NumPy will no longer try to swap out the BLAS out from under ARPACK-NG.