I want to profile a portion of the C code (user_defined_function())using oprofile and calculate the time taken to execute it. Any pointers on how to do this would be very helpful. Thanks in advance!!
#include <stdio.h>
int main()
{
//some statements;
//Begin Profiling
user_defined_function();
//End Profiling
//some statements;
return 0;
}
I don't see turn on / turn off markers in the http://oprofile.sourceforge.net/doc/index.html and http://oprofile.sourceforge.net/faq/ documentation. Probably calling (fork+exec) opcontrol
with --start
and --stop
will help if the code to be profiled is long enough.
With perf
tool in profiling (sampling) mode perf record
(and/or probably operf
which is based on the same perf_event_open
syscall) you can try to profile full program and add some markers at Begin Profiling
and End Profiling
points (by using some custom tracing event), then you can dump entire perf.data
with perf script
, find events of your markers and cut only part of the profile between markers (every event in perf.data has timestamp and they are ordered or can be sorted by time).
With direct use of perf_event_open
syscall you can enable and disable profiling from the same process with ioctl calls described in the "man 2 perf_event_open" page on the fd descriptor of perf with PERF_EVENT_IOC_ENABLE
/ PERF_EVENT_IOC_DISABLE
actions. Man page also lists using prctl to temporary disable and reenable profiling on the program (this may even work with oprofile, disable at start of main
, enable at Begin, disable at End)
Using prctl(2) A process can enable or disable all the event groups that are attached to it using the prctl(2) PR_TASK_PERF_EVENTS_ENABLE and PR_TASK_PERF_EVENTS_DISABLE operations.
Another way of using performance counter is not sampling profiling, but counting (perf stat ./your_program
/ perf stat -d ./your_program
does this). This mode will not give you list of 'hot' functions, it just will say that your code did 100 millions of instructions in 130 millions of cycles, with 10 mln L1 cache hits and 5 mln L1 cache misses. There are wrappers to enable counting on parts of program, for example: PAPI http://icl.cs.utk.edu/papi/ (PAPI_start_counters), perfmon2 (libpfm3,libpfm4), https://github.com/RRZE-HPC/likwid (pdf, likwid_markerStartRegion), http://halobates.de/jevents.html & http://halobates.de/simple-pmu, etc..