cudanvprof

local cache hit metric in cuda profiler


For some CUDA application profilings, I see that the value of local hit rate (local_hit_rate metric) is 0%.

I want to distinguish the following concepts with that value.

  1. The application has no access to the local cache.

  2. All accesses to local cache were misses.

How can I find the answer? Since the value of inst_compute_ld_st, ldst_issued and ldst_executed are non-zero, is it fine to discard the first question? Or there is something else?

The device is M2000 which is CC5.3 CC5.2


Solution

  • nvprof supports both events (raw counters) and metrics. These can be queried using the following commands: nvprof --query-events nvprof --query-metrics

    CC5./6. Local Memory Metircs

    local__request is the number of instructions executed to local memory via generic address space or local address space. On CC5./6.* I do not recall if this includes fully predicated of instructions.

    local_*_transactions is the number of cache accesses that occurred due to the size (32-bit, 64-bit, ...) of the request and the address divergence of the request. If this is non-zero then local memory was accessed.

    l2_local_*_bytes is the number of bytes of data loaded/stored to the L2 cache.