I have a set of shared libraries (Intel MKL) that are only distributed in binary form. A top-level "runtime" library, libmkl_rt.so
, links against my executable and is visible with ldd
:
...
libmkl_rt.so => /var/task/lib/libmkl_rt.so (0x00007f8049a1f000)
...
However, the other ones, such as libmkl_avx.so
, I assume are loaded dynamically with dlopen()
, as the executable throws an error saying the libraries are missing if not found, but are not visible with ldd
.
These libraries are large (> 100MB) and this is the only executable in my container using them. I assume that the executable is not calling each of the functions in these libraries, so I would like to slim them down, first determining which functions are being called, and then only keeping those.
How can I determine which symbols in the dynamically loaded shared libraries are actually used and extract only those symbols into a "slim" copy of the library?
Determine which symbols in the dynamically loaded shared libraries are actually used?
You can run your program under LD_DEBUG=bindings LD_BIND_NOW=1
and see which symbols from libmkl_avx.so
were bound.
Extract only those symbols into a "slim" copy of the library?
Unfortunately this is not possible for same reasons why you can't rearrange functions in executables. Once code is linked, all internal gotos and global variable locations are fixed and can't be changed. Even correctly disassembling linked code (to determine function boundaries and call graph) is an unsolvable problem (tools like IDA use heuristics to alleviate it but problem remains).
This shouldn't be a huge problem because OS will only load code pages that are actually used by your application.