x86performancecounterperfpapi

How to correctly measure IPC (Instructions per cycle) with perf


I wonder how to measure instructions per cycle correctly using perf. As reference: http://www2.engr.arizona.edu/~tosiron/papers/SPEC2017_ISPASS18.pdf used inst_retired.any and cpu_clk_unhalted.ref_tsc for their calculations, and I'm now wondering if this is the correct approach. In comparison, PAPI uses the hardware counters PAPI_TOT_INS and PAPI_TOT_CYC to calculate the IPC.

After some measurements I concluded:

On an example benchmark, cpu-cycles differs from cpu_clk_unhalted.ref_tsc by about 25%. The question is now, which of both values is the correct one for calculations? Or are both approaches wrong?


Solution

  • cpu-cycles is the actual core clock frequency which changes with turbo / power-save P-states. Use it if you care about microarchitectural things like how close to the 4 uops per clock front-end bottleneck you're achieving.

    cpu_clk_unhalted.ref_tsc is reference cycles, and always ticks at (close to) the rated / sticker speed of the CPU. (e.g. a fixed 4008 MHz on my 4GHz i7-6700k). Use it (or task-clock) if you care about work per time including the choice to turbo high or to stay at low clock speed when partly memory-bound. (Depends on EPP energy-performance-preference settings.)

    Fun fact: it uses the same clock source as RDTSC, but the event counter doesn't tick when the clock is halted, e.g. during CPU frequency transitions). Lost Cycles on Intel? An inconsistency between rdtsc and CPU_CLK_UNHALTED.REF_TSC

    (Semi-related: How to get the CPU cycle count in x86_64 from C++? for more about TSC and rdtsc)