I'm trying to profile (with Callgrind) a specific part of my code by removing noise and computation that I don't care about. Here is an example of what I want to do:
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
//Method to be profiled with these data
//Post operation on the data
}
My use-case is a regression test, I want to make sure that the method in question is still fast enough (something like less than 10% extra instructions since the last implementation). This is why I'd like to have the cleaner output form Callgrind. (I need a for loop in order to have a significant amount of data processed in order to have a good estimation of the behavior of the method I want to profile)
My first try was to change the code to:
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
CALLGRIND_START_INSTRUMENTATION;
//Method to be profiled with these data
CALLGRIND_STOP_INSTRUMENTATION;
//Post operation on the data
}
CALLGRIND_DUMP_STATS;
Adding the Callgrind macros to control the instrumentation. I also added the --instr-atstart=no options to be sure that I profile only the part of the code I want...
Unfortunately with this configuration when I start to launch my executable with callgrind, it never ends... It is not a question of slowness, because a full instrumentation run last less than one minute.
I also tried
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
CALLGRIND_TOGGLE_COLLECT;
//Method to be profiled with these data
CALLGRIND_TOGGLE_COLLECT;
//Post operation on the data
}
CALLGRIND_DUMP_STATS;
(or the --toggle-collect="myMethod" option) But Callgrind returned me a log without any call (KCachegrind is white as snow :( and says zero instructions...)
Did I use the macros/options correctly? Any idea of what I need to change in order to get the expected result?
I finally managed to solve this issue... This was a config issue:
I kept the code
for (int i=0; i<maxSample; ++i) {
//Prepare data to be processed...
CALLGRIND_TOGGLE_COLLECT;
//Method to be profiled with these data
CALLGRIND_TOGGLE_COLLECT;
//Post operation on the data
}
CALLGRIND_DUMP_STATS;
But ran the callgrind with --collect-atstart=no (and without the --instr-atstart=no!!!) and it worked perfectly, in a reasonable time (~1min).
The issue with START/STOP instrumentation was that callgrind dumps a file (callgrind.out.#number) at each iteration (each STOP) thus it was really really slow... (after 5min I had only 5000 runs for a 300 000 iterations benchmark... unsuitable for a regression test).