I've been trying to profile our app (amd64 RHEL 7.6 built with GCC 5.3 and using MKL + OMP). I used perf record, but all I see is a small number of samples in the OMP library. Nothing in main() or below. This is with one 10 minute run and also another that only lasts a second or so.
Is MKL + OMP doing some non-standard threading that perf can't follow?
I'll try running the test and then separately running perf record -p.
Does anyone have experience with perf record and MKL? Maybe VTune will work better!
It seems that the problem was with -f(no-)omit-frame-pointer. I was building with -O3 -g3 and for some reason perf record failed to get the stacks. I thought that -g3 would inhibit -fomit-frame-pointer. Presumably MKL still has the frame pointers, so perf could get its stack traces.