I am building a library (Ubuntu 22) that uses onnxruntime
under the hood. In turn, onnxruntime
uses CUDA, dynamically loading some dedicated "backend". I build the whole code stack except the CUDA libraries, and none of the libraries have their RPATH
or RUNPATH
set (double-checked with readelf -d
).
I build two apps, one is C++, and directly links to my library. The app has its RPATH
set and everything works fine. If I run it with LD_DEBUG=libs
I see stuff like this (note that the paths are edited and I'm showing only a tiny fraction of the debug output):
158834: calling init: .../install/bin/../lib/libonnxruntime_providers_cuda.so
158834:
158834: find library=libcudnn_ops_infer.so.8 [0]; searching
158834: search path=.../install/bin/../lib (RPATH from file .../install/bin/test)
158834: trying file=.../install/bin/../lib/libcudnn_ops_infer.so.8
158834:
158834:
158834: calling init: .../install/bin/../lib/libcudnn_ops_infer.so.8
158834:
This is what I expect, I'm happy.
However, I also need to use the very same library through some python bindings that link against it. To have it working, I need to set in this case the RPATH
of the python bindings (which, in my understanding at least, are just a shared library that gets loaded at runtime). Note that the Python executable doesn't have neither RPATH
nor RUNPATH
set. This works only in part. Namely, RPATH
propagation seems to work while walking down the dependency tree until it starts searching for the CUDA libraries, at that point it doesn't work any more. This is running exactly the same onnxruntime API in the same way, same build, with the same files in the same folder as above. The only difference is the python extension layer. The LD_DEBUG
output looks like this:
159602: find library=libonnxruntime.so.1.15.1 [0]; searching
159602: search path=.../install/lib/../lib (RPATH from file .../install/lib/pyext.cpython-310-x86_64-linux-gnu.so)
159602: trying file=.../install/lib/../lib/libonnxruntime.so.1.15.1
[...]
159602: calling init: .../install/lib/pyext.cpython-310-x86_64-linux-gnu.so
159602:
159602: find library=libonnxruntime_providers_shared.so [0]; searching
159602: search path=.../install/lib/../lib (RPATH from file .../install/lib/pyext.cpython-310-x86_64-linux-gnu.so)
159602: trying file=.../install/lib/../lib/libonnxruntime_providers_shared.so
159602:
159602:
159602: calling init: .../install/lib/../lib/libonnxruntime_providers_shared.so
159602:
159602: find library=libonnxruntime_providers_cuda.so [0]; searching
159602: search path=.../install/lib/../lib (RPATH from file .../install/lib/pyext.cpython-310-x86_64-linux-gnu.so)
159602: trying file=.../install/lib/../lib/libonnxruntime_providers_cuda.so
159602:
159602: find library=libcublas.so.11 [0]; searching
159602: search cache=/etc/ld.so.cache
159602: search path=/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3:/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2:/lib/x86_64-linux-gnu/tls/haswell/x86_64:/lib/x
86_64-linux-gnu/tls/haswell:/lib/x86_64-linux-gnu/tls/x86_64:/lib/x86_64-linux-gnu/tls:/lib/x86_64-linux-gnu/haswell/x86_64:/lib/x86_64-linux-gnu/haswell:/lib/x86_64-
linux-gnu/x86_64:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3:/usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2:/usr/lib/x86_64-linux-gnu/tls
/haswell/x86_64:/usr/lib/x86_64-linux-gnu/tls/haswell:/usr/lib/x86_64-linux-gnu/tls/x86_64:/usr/lib/x86_64-linux-gnu/tls:/usr/lib/x86_64-linux-gnu/haswell/x86_64:/usr
/lib/x86_64-linux-gnu/haswell:/usr/lib/x86_64-linux-gnu/x86_64:/usr/lib/x86_64-linux-gnu:/lib/glibc-hwcaps/x86-64-v3:/lib/glibc-hwcaps/x86-64-v2:/lib/tls/haswell/x86_
64:/lib/tls/haswell:/lib/tls/x86_64:/lib/tls:/lib/haswell/x86_64:/lib/haswell:/lib/x86_64:/lib:/usr/lib/glibc-hwcaps/x86-64-v3:/usr/lib/glibc-hwcaps/x86-64-v2:/usr/li
b/tls/haswell/x86_64:/usr/lib/tls/haswell:/usr/lib/tls/x86_64:/usr/lib/tls:/usr/lib/haswell/x86_64:/usr/lib/haswell:/usr/lib/x86_64:/usr/lib (system search
path)
159602: trying file=/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3/libcublas.so.11
159602: trying file=/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libcublas.so.11
159602: trying file=/lib/x86_64-linux-gnu/tls/haswell/x86_64/libcublas.so.11
[...]
159602: calling fini: .../install/lib/../lib/libonnxruntime_providers_shared.so [0]
So basically libcublas
is not found (nor any other of the CUDA libs), triggering a fallback mechanism in onnxruntime
that avoids using CUDA.
Why does RPATH
propagation work for the C++ app but not for the Python extension? Is there something silly I'm missing, or is it something deep related to how libraries are loaded in the context of a python session? Can it be the weird manifestation of a bug in onnxruntime
, maybe doing something wrong with dlopen
?
Note that the same issue seems to be present in the Python version of onnxruntime
itself: Their setup.py
makes sure that all dependencies are pre-loaded, using ctypes.CDLL
with RTLD_GLOBAL
.
Following this link: https://wiki.debian.org/RpathIssue. The dynamic linker ld
will look for a matching library in the following locations:
So in your case:
So to make libonnxruntime loads libcublas you must set RPATH on libonnxruntime too (so that rule 1 apply).
To help debugging that, one can use lddtree
tool (apt install pax-utils
) to get a hierarchical view of lib dependencies.