carmprofilingperf

Access frequency limitation for reading PMCCNTR_EL0?


I am using perf_event_open in my c profiling app to leverage perf in getting event data. In order to improve performance, I am reading the hardware registers directly by following the Perf Userspace PMU Hardware Counter Access documentation to read the PMU register directly using the mrs instruction.

I use the following code:

static struct perf_event_attr attr;
attr.type = PERF_TYPE_HARDWARE;
attr.config = PERF_COUNT_HW_CPU_CYCLES;
attr.exclude_kernel = 1;
attr.exclude_hv = 1;
attr.config1 = 3; // user access enabled

int fd = syscall(SYS_perf_event_open, &attr, 0, -1, -1, 0);

ioctl(fd, PERF_EVENT_IOC_RESET, 0);
ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

// Code where we want to measure performance. At certain points we call read_register_directly()

ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
close(fd);
uint64_t read_register_directly() {
  uint64_t value = 0;
  asm volatile("mrs %0, PMCCNTR_EL0 " : "=r" (value));
  return value;
}

The above code to read the register directly works properly with the perf configuration. The problem is that after ~25 reads of the register I am getting an "illegal instruction" error, though I'm not sure why.

I looked through the ARM docs for PMCCNTR_EL0 and some other resources but I haven't found anything which explains this illegal instruction error.


Solution

  • This was ultimately due to the attributes passed in the perf_event_open system call. The thread that made the system call was able to read the register directly, but other threads were resulting in "Illegal Instruction" error.

    There is a perf_event_attr flag inherit which allows the user to profile all threads in a process instead of just the thread executed by perf_event_open. So in the above code I added two things to fix the code flow: