cudaprofilingnsight-compute

Why is NSight Compute "missing" my program's kernel launches?


I'm using NSight Compute to profile a program which launches some CUDA kernels. I know for certain that they are launched; but when I press the "play" button in NSight Compute, despite having enabled the profiling of all kernels - the program profiling concludes (no crash/failure) - with no kernels profiled. Why is that?


Solution

  • To determine why this is, don't just "play" to the end. Instead, press the "->:" button, which gets you to the next kernel launch, then perhas advance again to the next API call (with "->*").

    After one of these you are likely to see some error reported under the "API Stream" listed.

    Example:

    Part of an NSight Compute app window

    In this case, we've gotten (the very common first encountered) ERR_NVGPUCTRPERM error - your user doesn't have permissions to access the GPU performance counters. You can add such permissions by following the instructions here.