I wonder how to measure instructions per cycle correctly using perf. As reference: http://www2.engr.arizona.edu/~tosiron/papers/SPEC2017_ISPASS18.pdf used inst_retired.any
and cpu_clk_unhalted.ref_tsc
for their calculations, and I'm now wondering if this is the correct approach. In comparison, PAPI uses the hardware counters PAPI_TOT_INS
and PAPI_TOT_CYC
to calculate the IPC.
After some measurements I concluded:
inst_retired.any:u
seems to be the same as PAPI_TOT_INS
cpu-cycles
seems to be the same as PAPI_TOT_CYC
On an example benchmark, cpu-cycles
differs from cpu_clk_unhalted.ref_tsc
by about 25%. The question is now, which of both values is the correct one for calculations? Or are both approaches wrong?
cpu-cycles
is the actual core clock frequency which changes with turbo / power-save P-states. Use it if you care about microarchitectural things like how close to the 4 uops per clock front-end bottleneck you're achieving.
cpu_clk_unhalted.ref_tsc
is reference cycles, and always ticks at (close to) the rated / sticker speed of the CPU. (e.g. a fixed 4008 MHz on my 4GHz i7-6700k). Use it (or task-clock
) if you care about work per time including the choice to turbo high or to stay at low clock speed when partly memory-bound. (Depends on EPP energy-performance-preference settings.)
Fun fact: it uses the same clock source as RDTSC, but the event counter doesn't tick when the clock is halted, e.g. during CPU frequency transitions). Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC
(Semi-related: How to get the CPU cycle count in x86_64 from C++? for more about TSC and rdtsc
)