c++cudaprofilingnvidiansight

Tracing custom CUDA kernels with Nsight Systems


I work on library which is implemented in C++20 and CUDA 11. This library is called from Python via ctypes through a C API that just exchanges JSON strings. We compile it using Clang 11.

In order to profile the code I have added a lot of NVTX ranges to the C++ code. This works well for me with Nsight Systems, I can see the stack of ranges with their manually chosen names when use nsys profile -t nvtx … to gather data. This doesn't tell me anything about the GPU though. So I specify nvtx,cuda,cublas,cudnn in order to get more information.

But all I get is one of the many kernels. The output looks like this:

enter image description here

One can see the nice NVTX contexts, one can see the calls to the CUDA API (memcpy and the like). But there is only one kernel showing up, I have marked it with a red arrow.

We have a bunch of different kernels and launch them with the <<<>>> syntax right from the .cu files.

It feels like I am missing either a tracing flag for nsys, some compilation option for the CUDA code or some code annotations like NVTX for the kernel code. What do I have to do such that my custom kernels show up in the profile?


Solution

  • The issue could have been that I have not properly stopped the data gathering and our program is an interactive server which one stops with a SIGINT. Perhaps the data was not properly stored after the interrupt.

    I have added calls to the profiler API in the code such that I explicitly call cudaProfilerStop() after our main loop is done. I've done it with a small RAII wrapper such that it works even with SIGINT.

    #include <cuda_profiler_api.h>
    
    class ProfilingRange {
     public:
      ProfilingRange() {
        cudaProfilerStart();
      }
    
      ~ProfilingRange() {
        cudaProfilerStop();
      }
    };
    

    On the nsys profile command line I specify --capture-range=cudaProfilerApi and it seems to work fine. Now a lot of kernels show up, and I can learn a lot more about the system.