cudaprofilingnvidiansight-systems

Is there a way in NVIDIA Nsight Systems to limit threads displayed?


I have a project which has thousands of threads, but I want to use the Nsight System to profile the CUDA code. However, loading the report takes a while which I believe is due to the high number of thread information, in addition to all the visual clutter of those threads which I don't currently care about information on.

Is there a way to toggle collecting thread information or limit it before loading a report in the Nsight System GUI?


Solution

  • Is there a way to toggle collecting thread information?

    If profiling through the CLI

    Check the -s/--sample and --cpuctxsw options, for the profile or start commands, link to documentation. You can set both to none, to minimize the amount of information collected from the CPU side.

    If profiling a Linux target: check also the -t/--trace option for the profile or launch commands. Essentially you would like to exclude osrt from the trace options, it is enabled by default.

    If you want to collect only CUDA events, then you can use nsys profile -t cuda -s none --cpuctxsw=none <app>.

    If profiling through the GUI

    You can deselect the "Collect CPU IP/backtrace samples" and "Collect CPU context switch trace" boxes.

    enter image description here

    If profiling a Linux target: you can additionally deselect the "Collect OS runtime libraries trace" box.

    enter image description here

    Is there a way to limit it before loading a report in the Nsight System GUI?

    If the data is collected, it is not possible to exclude it from rendering on the GUI. You can minimize threads, or hide them by right clicking on "Threads" -> "Show less".