I have a project which has thousands of threads, but I want to use the Nsight System to profile the CUDA code. However, loading the report takes a while which I believe is due to the high number of thread information, in addition to all the visual clutter of those threads which I don't currently care about information on.
Is there a way to toggle collecting thread information or limit it before loading a report in the Nsight System GUI?
Is there a way to toggle collecting thread information?
If profiling through the CLI
Check the -s/--sample
and --cpuctxsw
options, for the profile
or start
commands, link to documentation. You can set both to none
, to minimize the amount of information collected from the CPU side.
If profiling a Linux target: check also the -t/--trace
option for the profile
or launch
commands. Essentially you would like to exclude osrt
from the trace options, it is enabled by default.
If you want to collect only CUDA events, then you can use nsys profile -t cuda -s none --cpuctxsw=none <app>
.
If profiling through the GUI
You can deselect the "Collect CPU IP/backtrace samples" and "Collect CPU context switch trace" boxes.
If profiling a Linux target: you can additionally deselect the "Collect OS runtime libraries trace" box.
Is there a way to limit it before loading a report in the Nsight System GUI?
If the data is collected, it is not possible to exclude it from rendering on the GUI. You can minimize threads, or hide them by right clicking on "Threads" -> "Show less".