linuxperformancecpuperfcpu-cycles

Measurement of CPU cycles with perf


Currently I'm working on measuring the time of code execution.

I can measure how much time took to execute function (using tracepoints) but I also need to measure how many cpu cycles the execution of funcB took, and it would be good to know how perf measured the time I got (is it a simple diff of unix time, or does it really use PMU counters)..

I would be very grateful for your help.


I have created a simple .c program which has a function that is called twenty times, and this function has a delay of about 1 second.

#include <stdio.h>
#include <unistd.h>

void funcB(int l){
    usleep(1000*l);
    printf("l=%d\n", l);
}

int main(){
    for (int i = 0; i < 20; i++){
      printf("i=%d\n", i);
      funcB(1000);
    }
    return 0;
}

I compiled this file with g flag (to include debugging info) Then I used perf probe to create tracepoints at funcB entry and funcB return. Then I ran perf record including those events (perf record -e $funcb_probe -e $funcb_ret_probe ./test), and then dumped data from perf script (perf script --ns -F time,event --deltatime), and then decoded it with my python script.

(--deltatime: used to measure time difference between events, -F time,event : used to make dumped file easier to decode)

And I can plot the execution time of funcB.

UPDATE 29/04/2023

Tu visualise problem i have two graphs (both execute selection sort with 10k data) (.c file updated)

#include <stdio.h>
#include <unistd.h>
#include <math.h>
#include <stdlib.h>

void insertionSort(int arr[], int n)
{
    int i, key, j;
    for (i = 1; i < n; i++) {
        key = arr[i];
        j = i - 1;
        
        while (j >= 0 && arr[j] > key) {
            arr[j + 1] = arr[j];
            j = j - 1;
        }
        arr[j + 1] = key;
    }
}

void funcB(int arr[], int n){
    insertionSort(arr, n);
}

int main(){
    for (int i = 0; i < 200; i++){
        int arr[10000];

        for (int f = 0; f < 10000 ;f++) {
            arr[f] = (rand() % 1000);
        }
        int n = sizeof(arr) / sizeof(arr[0]);
        printf("i=%d\n", i);
        funcB(arr, n);
    }
    return 0;
}

This one is without CPU stress: Without CPU stress (peaks are still visible)

And this with CPU stress: With CPU stress (CPU stress is visible from tracepoit 0 to ca. 110)


So, increased CPU usage is visible on the graph (i.e. my function execution time has increased), my question is, is it possible to measure the exact time my function takes (independent of CPU usage), or is it possible to know how many CPU cycles my function took to execute (which should be independent of CPU usage)?


Solution

  • Yes, you can measure CPU cycles, issued instructions and other hardware events supported by the PMU. The kernel automatically re-configures the PMU when rescheduling your process, so events from other processes are not counted.

    With tracepoints, you can use group leader sampling:

    perf record -e '{probe_a:funcB,cycles:u}:S'
    

    This adds PMU event cycles counting only in userspace (:u) to the counter group and enables leader sampling (:S) so that when sampling the leader (here the uprobe tracepoint probe_a:funcB) values of both counters are recorded. See perf help list for more information about event specification syntax.

    Alternatively, you can use the perf_event_open syscall to configure and retrieve PMU counters directly from C code. Depending on your circumstances, this may have the following advantages:

    The perf_event_open man page offers a short example that demonstrates how to read a single counter. For reading multiple counters, you can add them in a group and retrieve them with a single read syscall: they will be frozen on privilege change (if configured with .ignore_kernel = 1), making the read-out atomic.