debugginggdbshared-librariescross-compilingcoredump

GDB does not attempt to load any shared library when core file is loaded


I am creating a process to improve debugging of a remote LINUX system by enabling core dumps on an embedded platform. But I'm having trouble getting gdb to load symbols from shared libraries after loading a core file.

I've amended my CI build system to automatically create --only-keep-debug ELF files for all binaries and libraries based upon build-id as seems to be recommended practice - in addition to --strip-unneeded versions for the target.

Debugging the process,

  1. Generate self-inflicted assert to trigger a core-file on target, and copy back to host's CWD.
  2. Recreate the target system file structure on host under CWD - copying stripped binary and necessary stripped libraries in their correct location. The library necessity is auto-confirmed by a parsing (using cross-comp-readelf --all core.a.out | grep,awk,sort,etc...) that produces:
/lib/libfoo.so
/usr/lib/libbar.so

that allows the process to auto create from CI archive:

CWD/sysroot/bin/a.out *stripped-binary-that-crashed*
CWD/sysroot/lib/libfoo.so *some-stripped-libraries*
CWD/sysroot/usr/lib/libbar.so *more-stripped-libraries*
  1. Within CWD$ path/to/cross-comp-gdb (run under strace -e open,openat -o st.log ...)
(gdb) set sysroot CWD/sysroot
(gdb) set solid-absolute-prefix CWD/sysroot
(gdb) file sysroot/bin/a.out
Reading symbols from sysroot/bin/a.out...
Reading symbols from CWD/.build-id/b5/9700de946784bbf2d65f9993145d14a3ba9a89.debug...
(gdb) core-file core.a.out
[New LWP 10226]
Core was generated by `a.out'.
Program terminated with signal SIGABRT, Aborted.
#0  0xb661dd16 in ?? ()
  1. And gdb makes no attempt to load libfoo.so or libbar.so... [confirmed by strace st.log]
(gdb) info sharedlibrary 
No shared libraries loaded at this time.
$ grep "libfoo" st.log
$

I have confirmed gdb's auto-solib-add is on, and I've also tried every other setting I can find on any gdb documentation sites - like set auto-load safe-path /

But I cannot fathom why gdb is not attempting to open and read symbols from libfoo.so or libbar.so once it has loaded the core-file.


Solution

  • This:

    (gdb) info sharedlibrary
    No shared libraries loaded at this time.

    means that GDB believes that no shared libraries were loaded when the core was produced.

    Thus GDB has no reason to look for any shared libraries.

    The root cause is usually a mismatch between the binary loaded with (gdb) file ... and the binary which actually produced the core.

    You can use eu-unstrip -n --core core.a.out. This should produce output similar to

    0x565481252000+0x5000 347372645d444e4437d8784840e889e3baaee2f0@0x565481252368 . . /tmp/t
    0x7f3f0deb4000+0x1000 f782ff35a6a694d76861d9cef47136aefdb4d0f1@0x7f3f0deb4554 . - linux-vdso.so.1
    0x7f3f0deb6000+0x332b8 f5fc74ba82d83f70b1e38b1d1c4172ade591d3b6@0x7f3f0deb6248 /lib64/ld-linux-x86-64.so.2 /usr/lib/debug/.build-id/f5/fc74ba82d83f70b1e38b1d1c4172ade591d3b6.debug ld-linux-x86-64.so.2
    0x7f3f0dcb2000+0x1e3d90 3ddd476a0eddfeb6390b2791bd945afaa13978ff@0x7f3f0dcb2380 /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/debug/.build-id/3d/dd476a0eddfeb6390b2791bd945afaa13978ff.debug libc.so.6
    

    The key is that build-id of the main binary (/tmp/t above) is the same as that of your sysroot/bin/a.out (in other words, it should be b59700de946784bbf2d65f9993145d14a3ba9a89).

    If the build-ids don't match, you've made a mistake somewhere.

    If they do match, there is a bug in your cross-gdb. Perhaps try a newer version.