clinkersegmentation-faultshared-libraries

How to determine why this executable crashes on start? (C on *nix)


So I've got an application that builds and runs fine. I started playing with some compiler settings on a bunch of the 20+ static and dynamic libraries it links to, and the app consistently crashes on start at a particular point.

I reverted my build system mods, and and the app is fine again. (However, I forgot to make note of the changes I actually made...)

Purely as a matter interest, since I still have a copy of the bad binary (built in debug mode), I'd like to try and reverse engineer what I did exactly to screw things up. :-)

Attaching a debugger, I can see that a function in a .so is passing to the main app, via a function call, a function pointer for the main app to use as a callback.

In the frame of the caller (in the .so) the function pointer has one value. In the frame of the callee (in the main app), the function pointer magically has a different value.

When the main app attempts to call the stored function pointer, the program crashes with a segfault (mapper error). The function pointer seems to still have the 2nd value which is presumably wrong.

Any suggestions on how to pick this apart?


Update: More details as requested.

Crash is at:

 0xfffffffffffeac58 ???????? ()
 0x000000000063a040 app_exec_callbacks()
 0xfffffd7ffea14514 dynlib_stuff ()
 0x000000000063a15e other_app_stuff () 
 0x00000000004d10a3 app_stuff ()
 0x0000000000499f20 main ()

Prior to this, the function dynlib_do_setup() passes a function pointer ( of dynlib_callback_handler()) to the app via a call to app_register_callback().

Using a debugger, I can see that within dynlib_do_setup() the function pointer has a value of 0xfffffd7ffea13fa0. Inside the function body of app_register_callback() the parameter suddenly has a value of 0xfffffffffffeac58

Thus, the garbage value of 0xfffffffffffeac58 is stored in the app's callback table, and not surprisingly, the app crashes when that callback address is called as you can see in the stack trace above.

Again, purely as a matter of educational interest, how/why can this magical corruption happen? How/why did reverting my build system changes suddenly fix this?


Solution

  • Found the answer:

    Compiler bug as the result a certain optimization flag combo. Known but not fixed for the past 3 years. :(