I'd like to measure the number of instructions executed in my program including speculative instructions that didn't retire. I know that linux perf can easily report the retired instruction count with:
$ perf stat --event instructions -- <my_program>
Is there a way to do that? I could not find a suitable performance event under perf list
.
Is there another proxy to measure the amount of speculative execution in the processor?
More information: I have an Intel Skylake machine, but answers relevant to other Intel/AMD processors would be great as well.
instructions
counts retired instructions. Speculative exec works in terms of uops; the CPU only cares about instruction boundaries at decode and retirement. (And somewhat in the uop cache.)
An event like uops_executed.thread
is probably what you want, vs. uops_retired.retire_slots
. But with micro-fusion, add eax, [rdi]
is 1 fused-domain uop (issue and retire), but two unfused-domain uops_executed.thread
.
uops_dispatched_port.port_0
/ 1
/ 5
/ 6
for ALU ports, and 2,3,7 load/store-address and port 4 store-data events also exist; uops_executed.thread
should mostly(?) be the sum of those per-port counters.
To count mis-speculation, you could compare uops_issued.any
vs. uops_retired.retire_slots
. That won't tell you how many of the mis-speculated uops actually got executed before mis-speculation was detected, and I don't know a good way to do that other than careful counting, like knowing how many unfused-domain uops there were in the uops_retired.retire_slots
uops that retired. That may be doable in a microbencmark loop, otherwise you'd just have to go by averages in a larger program.
(The event names I mention all exist on my Skylake-client CPU, and probably earlier and later Intel. AMD will have very different event names.)