cudaprofilingutilizationnvml

Is there any way or even possible to get the overall utilization of a GPU during a period of time?


I am trying to get the information about the overall utilization of a GPU (mine is an NVIDIA Tesla K20, running on Linux) during a period of time. By "overall" I mean something like, how many streaming multi-processors are scheduled to run, and how many GPU cores are scheduled to run (I suppose if a core is running, it will run at its full speed/frequency?). It would be also nice if I can get the overall utilization measured by flops.

Of course before asking the question here, I've searched and investigated several existing tools/libraries, including NVML (and nvidia-smi built on top of it), CUPTI (and nvprof), PAPI, TAU, and Vampir. However, it seems (but I am not sure yet) none of them could provide me with the needed information. E.g., NVML can report "GPU Utilization" by percent, but according to its document/comment, this utilization is "Percent of time over the past second during which one or more kernels was executing on the GPU", which is apparently not accurate enough. For nvprof, it can report flops for individual kernel (with very high overhead), but I still don't know how well the GPU is utilized.

PAPI seems to be able to get instruction count, but it cannot different float point operation from others. I haven't tried other two tools (TAU and Vampir) yet, but I doubt they can meet my need.

So I am wondering is it even possible to get the overall utilization information of a GPU? If not, what is the best alternative to estimate it? The purpose I am doing this is to find a better schedule for multiple jobs running on top of GPU.

I am not sure if I've described my question clearly enough, so please let me know if there is anything I can add for a better description.

Thank you very much!


Solution

  • nVidia Nsight plugin to Visual Studio has very nice graphical features that give the statistics you want. But I have the feeling that you have a Linux machine so Nsight won't work.

    I suggest using nVidia Visual Profiler.

    The metrics reference is fairly complete and can be found here. This is how I would gather the data you are interested in:

    Regarding frequency, I wouldn't worry but it doesn't harm to check with nvidia-smi. If your card has enough cooling then it will stay at peak frequency while running.

    Check the metrics reference as it will provide you much more useful information.

    I think NVprof also supports multiple processes. Check here. You can also filter by process ID. So you can collect these metrics "multi-context" or "single-context". In the metrics reference table, you have a column that states if they can be collected in both the cases.

    Note: The metrics are computed using the HW performance counters, and driver level analysis. If nvidia tools cannot provide more than this, then it's not likely that other tools will be able to offer more. But I think that properly combining the metrics can tell you everything you want about your app run.