c++profilingclockkcachegrindcallgrind

C++ profiling: clock cycle count


I'm using valgrind --tool=callgrind to profile a critical part of my C++ program.

The part itself takes less that a microsecond to execute so I'm profiling over a large number of loops over that part.

I noticed that instructions take multiples of 0.13% time to execute (percentage out of program total time to execute). So I only see 0.13, 0.26, 0.52, so on.

My question is, should I assume that this atomic quantity measures a CPU cycle? See photo. (The callgrind output is presented graphically with kcachegrind.)

enter image description here

Edit: By the way, looking at machine code, I see mov takes 0.13 so that's probably a clock cycle indeed.


Solution

  • Callgrind doesn't measure CPU time. It measures instruction reads. That's where the "Ir" term comes from. If the multiples are of .13% (especially since you confirmed with mov) then it means that they are measuring a single instruction read. There are also cache simulation options that let it measure how likely you are to have cache misses.

    Note that not all instructions will take the same time to execute, so the percentages do not exactly match the amount of time spent in each section. However, it still gives you a good idea of where your program is doing the most work, and likely spending the most time.