I'm using NSight Compute to profile a program which launches some CUDA kernels. I know for certain that they are launched; but when I press the "play" button in NSight Compute, despite having enabled the profiling of all kernels - the program profiling concludes (no crash/failure) - with no kernels profiled. Why is that?
To determine why this is, don't just "play" to the end. Instead, press the "->:" button, which gets you to the next kernel launch, then perhas advance again to the next API call (with "->*").
After one of these you are likely to see some error reported under the "API Stream" listed.
Example:
In this case, we've gotten (the very common first encountered) ERR_NVGPUCTRPERM
error - your user doesn't have permissions to access the GPU performance counters. You can add such permissions by following the instructions here.