cudagpuprofilingnvidianvprof

Why don't I get "thread_inst_executed"


When I list nvprof's metrics with

nvprof --query-events

I see:

thread_inst_executed: Number of instructions executed by the active threads. For each instruction it increments by number of threads, including predicated-off threads, that execute the instruction. It does not include replays.

I would like to use this metric, so I collect metrics using:

nvprof --csv --metrics thread_inst_executed,inst_executed,inst_executed_global_loads,inst_executed_global_stores,inst_executed_local_loads,inst_executed_local_stores,inst_executed_shared_loads,inst_executed_shared_stores,gld_transactions,gst_transactions,local_load_transactions,local_store_transactions,shared_load_transactions,shared_store_transactions,l2_read_transactions,l2_write_transactions,dram_read_transactions,dram_write_transactions,sysmem_read_transactions,sysmem_write_transactions ./my_program my arguments

The output has every metric I asked for... except thread_inst_executed. Why is it missing? How can I get it?


Solution

  • When I list nvprof's metrics with

    nvprof --query-events

    That isn't consistent usage (emphasis added).

    Using nvprof (or nvvp), events and metrics are not the same thing.

    To query events, you would use:

    --query-events
    

    To query metrics, you would use:

    --query-metrics
    

    To profile, asking for an event measurement, you would use

    --events name_of_event,...
    

    To profile, asking for a metric measurement, you would use

    --metrics name_of_metric,...
    

    If you do something like this:

    --metrics name_of_event,...
    

    or

    --events name_of_metric,...
    

    I don't know what the behavior is, but I would not expect it to work properly.