cdebugginggdb

How does GDB actually know what symbols to load in and at what address?


I have been writing a dynamic program loader in C. More specifically, I am writing a chain loader which runs some code before loading in the normal dynamic loader (ld.so). This allows me to run some custom code before the application is actually loaded into memory.

My custom loader is based on this open-source loader https://github.com/Ferdi265/dynamic-loader

While this works fine, gdb does not recognize that anything is loaded in except for the custom dynamic loader.

When I run "info shared" it returns: "No shared libraries loaded at this time" even while that actual application is running.

And "info file" only shows information about the original loader.

I expected ld.so to handle loading in the symbols for my application and telling gdb where to find them by using the _r_debug struct. So now, I am wondering: how does GDB actually know what shared objects are loaded in and at what addresses?


Solution

  • So now, I am wondering: how does GDB actually know what shared objects are loaded in and at what addresses?

    It knows through the rendezvous _r_debug structure, which it finds in the DT_DEBUG dynamic entry of the main executable, which it finds via AT_PHDR entry in the aux vector passed in by the kernel.

    In your case, the AT_PHDR points to the program headers for your loader, and there is probably no PT_DYNAMIC in there (which is logical: why would there be?), from which GDB concludes that this is a fully-static binary and no shared library loading will ever happen.

    You can examine the aux vector with (gdb) info auxv.

    You can also look at GDB sources. Start with solib-svr4.c and elf_locate_base which has this comment:

    /* Locate the base address of dynamic linker structs for SVR4 elf
       targets.
    
       For SVR4 elf targets the address of the dynamic linker's runtime
       structure is contained within the dynamic info section in the
       executable file.  The dynamic section is also mapped into the
       inferior address space.  Because the runtime loader fills in the
       real address before starting the inferior, we have to read in the
       dynamic info section from the inferior address space.
       If there are any errors while trying to find the address, we
       silently return 0, otherwise the found address is returned.  */
    

    Update:

    you can run a program by directly invoking the dynamic loader as the main program, like this: /usr/lib/x86_64-linux-gnu/ld-2.31.so ./myprogram. When I checked in GDB, the auxiliary vector it reads seems to belong to ld.so. Is this handled differently because ld.so is a static executable and doesn't have a dynamic section?

    On my system, /lib64/ld-linux-x86-64.so.2 does have PT_DYNAMIC:

     readelf -Wl  /lib64/ld-linux-x86-64.so.2 | grep DYNAMIC
      DYNAMIC        0x036e80 0x0000000000036e80 0x0000000000036e80 0x000180 0x000180 RW  0x8
    

    However, there is no DT_DEBUG in it, so the mechanism I described above wouldn't work.

    In the solib-svr4.c I already mentioned, there is a "if the other method didn't work, try looking up symbols" alternate method, which looks for _r_debug (among others). The _r_debug is present in both static and dynamic symbol tables of my ld-linux*.so.2. I am guessing that's the mechanism used in that case.