I'm trying to profile a C++ program with callgrind, and I was certainly not expecting it to spend most of its time opening dynamic libraries.
I don't see any of the functions that I know to take time (according to logs), and I don't understand the call stack: What is calling this _dl_start? Dependencies loaded before main()?
I have a program spending several hours creating some 3000 objects, which is too long for my requirements.
I tried to analyse a run with callgrind locally, and it did give me workable results.
valgrind --tool=callgrind --collect-systime=msec <my_program>
I'm using kCachegrind to browse the results, as per official recommendations
Unfortunately, the code ran in only a few seconds on my machine on a debug build.
So I tried to profile the real program in a production environment, to compare the results. Admittedly the setup isn't exactly the same, and for practical reasons the real program does a lot more stuff than just what I tested locally, so I was expecting to have to sort through and ignore some functions.
But all I'm seeing is those _dl_start, _dl_sysdep_start, dl_main, _dl_relocate_object, _dl_lookup_symbol_x, do_lookup_x, etc., and none of my functions.
(Well, my program, since I'm answering my own question.)
Callgrind is showing perfectly valid information, it's just not analysing the right process.
What happens is that the real program uses a watchdog pattern, where a main program forks into a daemon process that actually does the heavy lifting.
I was running Valgrind on the main process, which appropriately reports that my program spends most of its time forking, because the watchdog does basically nothing besides that.
In my case, the program supports a --no-daemon
mode of operation that does not fork, so I just had to call valgrind with that instead.