[SOLVED] system wide perf_event

system wide perf_event_open

Using the perf cli, we can measure system wide counters :

$ sudo perf stat -e cpu-cycles
^C
 Performance counter stats for 'system wide':

     4 247 009 923      cpu-cycles                                                  

       2,183469627 seconds time elapsed

On the perf_event_open manual, I assume it would be te equivalent of monitoring any pid (pid == -1) on any cpu (cpu == -1) but it doesn't seem to be possible :

   Arguments
   The pid and cpu arguments allow specifying which process and CPU
   to monitor:

   pid == 0 and cpu == -1
          This measures the calling process/thread on any CPU.

   pid == 0 and cpu >= 0
          This measures the calling process/thread only when running
          on the specified CPU.

   pid > 0 and cpu == -1
          This measures the specified process/thread on any CPU.

   pid > 0 and cpu >= 0
          This measures the specified process/thread only when
          running on the specified CPU.

   pid == -1 and cpu >= 0
          This measures all processes/threads on the specified CPU.
          This requires CAP_PERFMON (since Linux 5.8) or
          CAP_SYS_ADMIN capability or a
          /proc/sys/kernel/perf_event_paranoid value of less than 1.

   pid == -1 and cpu == -1
          This setting is invalid and will return an error.

What is the workaround here?

Solution

It looks like none of those options aggregate counts for you, they either count on one core, or virtualize the counters across context switches for a single thread (or a single-threaded process).

If you look at what system-wide perf stat -a does (e.g. with strace -f perf stat), you can see it calls perf_event_open once per event per core. It has to add up the counts for an event across cores; the system-call API won't do that for you.