I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals.
I have profiled an application and found that I have some called-into methods that have a higher hit count than their calling parents
I have searched for clone .actor
with reference to profiling and found nothing useful.
It is easy to tell that for classic synchronous code elsewhere - children are always <= their parent costs, in comparison to the coroutine code.
What is clone .actor
in this context and why do the "children" cost more than their parents in this case? Is there anyway to tell what this operation actually is doing?
For context on how I gathered my profiling data
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so.0 CPUPROFILE=./prof.out ./my-program
/usr/bin/google-pprof --callgrind "$(realpath ./my-program)" ./prof.out > ./callgrind.out
kcachegrind
Please (strongly) consider switching to much better and more capable go pprof implementation (github.com/google/pprof). Sadly, distros continue to ship our old perl implementation, but upcoming 2.17 release already had that pprof implementation amputated. So, hopefully, it will encourage distros some more.
.clone thingy is artifact of optimizations and demangling. Sometimes compilers create optimized copies of certain functions (e.g. constant propagating some things). pprof is supposed to remove this detail from function name. (But sometimes you want to see those details and more, such as template arguments; see --symbolize option for that)
if or when you see mixed up parent/child relations, consider checking your stack trace capturing method. Skipping "first parent" stack frame is a known issue with frame-pointers-based stacktrace capturing. See here: https://github.com/gperftools/gperftools/wiki/gperftools'-stacktrace-capturing-methods-and-their-issues#frame-pointers