I'm trying to profile my software in order to optimize it.
I used gprof
with the compilation flag -g -pg -O3
but the result are not giving me enough precision.
Here is my Stacktrace of compilation:
$: make clean; make;
rm -f ./obj/*.o
rm -f ./bin/mdk-verifier
rm -f ./grammar/modal.output
rm -f ./grammar/modal.tab.h
rm -f ./grammar/*.cpp
rm -f ./lex.backup
bison -d -t -l -v -o ./grammar/modal.tab.c ./grammar/modal.y && mv ./grammar/modal.tab.c ./grammar/modal.tab.cpp
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./grammar/modal.tab.cpp -o ./obj/modal.tab.o
flex -l -b -o./grammar/lex.yy.cpp ./grammar/modal.lex
g++ -O3 -g -pg -I./include -c ./grammar/lex.yy.cpp -o ./obj/lex.yy.o
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./src/Kripke.cc -o ./obj/Kripke.o
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./src/Term.cc -o ./obj/Term.o
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./src/BooleanConstant.cc -o ./obj/BooleanConstant.o
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./src/Variable.cc -o ./obj/Variable.o
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./src/PropositionalVariable.cc -o ./obj/PropositionalVariable.o
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./src/Operation.cc -o ./obj/Operation.o
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./src/BooleanOperation.cc -o ./obj/BooleanOperation.o
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./src/ModalOperation.cc -o ./obj/ModalOperation.o
g++ -O3 -g -pg -fPIC -std=c++11 -I./include -c ./src/Formula.cc -o ./obj/Formula.o
g++ -O3 -g -pg -fPIC -std=c++11 -o ./obj/Main.o -c ./src/Main.cc
g++ -O3 -g -pg -static -lprofiler -o ./bin/mdk-verifier ./obj/modal.tab.o ./obj/lex.yy.o ./obj/Kripke.o ./obj/Term.o ./obj/BooleanConstant.o ./obj/PropositionalVariable.o ./obj/Variable.o ./obj/Operation.o ./obj/BooleanOperation.o ./obj/ModalOperation.o ./obj/Formula.o ./obj/Main.o
And here is how I call my program:
$: ./bin/mdk-verifier ./problem.txt < solution.txt
So after execution, everything is fine, I get a gmon.out
file. I'm executing the command gprof ./bin/mdk-verifier | more
and I get the following results:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
34.00 2.13 2.13 18 118.33 118.33 ModalOperation::checkBranch(Kripke&, unsigned int)
...
...
5.91 4.98 0.37 54684911 0.00 0.00 BooleanOperation::checkBranch(Kripke&, unsigned int)
4.63 5.27 0.29 54684911 0.00 0.00 PropositionalVariable::checkBranch(Kripke&, unsigned int)
And obviously, the count of calls for ModalOperation::checkBranch overflowed... and by making a display everytime I'm entering this function, I indeed made more than 18 calls...
So I thought about using another profiler, more precise and I found GPerfTools by Google.
I wanted to use it, I installed on my Ubuntu:
and by following the tutorial, They asked me to set the environment variable CPUPROFILE
I did and I get:
$: env | grep "CPU"
CPUPROFILE=./prof.out
I also put -lprofiler
during the linking of my executable, So I thought that everything was okay and that I could start profiling the data in the file ./prof.out
But unfortunately, this file is not appearing... Nothing is created, so I can't profile anything...
Does anyone has an idea about why the ./prof.out
file is not created and why the profiling is not gathering data ?
Thanks in advance for your help !
Best Regards;
Your purpose is to save time in your software. Multiple issues, first the negatives:
-O3: The compiler can optimize certain things. It cannot optimize the things that only you can optimize. What it can do is make them hard to find, by scrambling the code. The time to use -O3 is after you've found and fixed what you can.
gprof
is venerable, but little more. It samples the program counter and counts function calls. Here is a list of problems with that.
It does give you a call graph, but speedups can easily hide in that.
gperftools
is better (REVISED in response to Aliaksei's comment) because it is a true stack-sampler. Normally it is a "CPU-profiler", in which mode it is blind to any time spent in blocking, like I/O or sleep. However, if you set environment variable CPUPROFILE_REALTIME=1
you can make it sample on wall-clock time, so it will see I/O, sleeps, and other blocking system calls.
It has numerous output options.
It does not seem to make it easy to see a small random selection of the actual stack samples themselves, with line number information.
Now for the positive: