cudacommand-line-interfaceprofilingnsight-compute

How can I get a kernel's execution time with NSight Compute 2019 CLI?


Suppose I have an executable myapp which needs no command-line argument, and launches a CUDA kernel mykernel. I can invoke:

nv-nsight-cu-cli -k mykernel myapp

and get output looking like this:

==PROF== Connected to process 30446 (/path/to/myapp)
==PROF== Profiling "mykernel": 0%....50%....100% - 13 passes
==PROF== Disconnected from process 1234
[1234] myapp@127.0.0.1
  mykernel(), 2020-Oct-25 01:23:45, Context 1, Stream 7
    Section: GPU Speed Of Light
    --------------------------------------------------------------------
    Memory Frequency                      cycle/nsecond      1.62
    SOL FB                                %                  1.58
    Elapsed Cycles                        cycle              4,421,067
    SM Frequency                          cycle/nsecond      1.43
    Memory [%]                            %                  61.76
    Duration                              msecond            3.07
    SOL L2                                %                  0.79
    SM Active Cycles                      cycle              4,390,420.69
    (etc. etc.)
    --------------------------------------------------------------------
    (etc. etc. - other sections here)

so far - so good. But now, I just want the overall kernel duration of mykernel - and no other output. Looking at nv-nsight-cu-cli --query-metrics, I see, among others:

gpu__time_duration           incremental duration in nanoseconds; isolated measurement is same as gpu__time_active
gpu__time_active             total duration in nanoseconds 

So, it must be one of these, right? But when I run

nv-nsight-cu-cli -k mykernel myapp --metrics gpu__time_duration,gpu__time_active

I get:

==PROF== Connected to process 30446 (/path/to/myapp)
==PROF== Profiling "mykernel": 0%....50%....100% - 13 passes
==PROF== Disconnected from process 12345
[12345] myapp@127.0.0.1
  mykernel(), 2020-Oct-25 12:34:56, Context 1, Stream 7
    Section: GPU Speed Of Light
    Section: Command line profiler metrics
    ---------------------------------------------------------------
    gpu__time_active                                   (!) n/a
    gpu__time_duration                                 (!) n/a
    ---------------------------------------------------------------

My questions:

Notes: :


Solution

  • tl;dr: You need to specify the appropriate 'submetric':

    nv-nsight-cu-cli -k mykernel myapp --metrics gpu__time_active.avg
    

    (Based on @RobertCrovella's comments)

    CUDA's profiling mechanism collects 'base metrics', which are indeed listed with --list-metrics. For each of these, multiple samples are taken. In version 2019.5 of NSight Compute you can't just get the raw samples; you can only get 'submetric' values.

    'Submetrics' are essentially some aggregation of the sequence of samples into a scalar value. Different metrics have different kinds of submetrics (see this listing); for gpu__time_active, these are: .min, .max, .sum, .avg. Yes, if you're wondering - they're missing second-moment metrics like the variance or the sample standard deviation.

    So, you must either specify one or more submetrics (see example above), or alternatively, upgrade to a newer version of NSight Compute, with which you actually can just get all the samples apparently.