To measure metrics/events for CUDA programs, I have tried using the command line like:
nvprof --metrics <<metric_name>>
I also measured the same metrics on the Visual profiler nvvp
. I noticed no difference in the values I get.
I noticed a difference in output when I choose a metric like achieved_occupancy
. But this varies with every execution and that's probably why I get different results each time I run it, irrespective of whether I am using nvvp
or nvprof
.
The question:
I was under the impression that nvvp
and nvprof
are exactly the same, and that nvvp
is simply a GUI built on top of nvprof
for ease of use. However I have been given this advice:
Always use the visual profiler. Never use the command line.
Also, this question says:
I do not want to use the command line profiler as I need the global load/store efficiency, replay and DRAM utilization, which are much more visible in the visual profiler.
Apart from 'dynamic' metrics like achieved_occupancy
, I never noticed any differences in results. So, is this advice valid? Is there some sort of deficiency in the way nvprof
works? I would like to know the advantages of using the visual profiler over the command line form, if there are any.
More specifically, are there metrics for which nvprof
gives wrong results?
Note:
My question is not the same as this or this because these are asking about the difference between nvvp
and Nsight.
I'm not sure why someone would give you the advice:
Never use the command line.
assuming by "command line" you do in fact mean nvprof
. That's not sensible. There are situations where it makes sense to use nvprof
. (Note that if you actually meant the command line profiler, then that advice might be somewhat sensible, although still a matter of preference. It is separate from nvprof
so has a separate learning curve. I personally would use nvprof
instead of the command line profiler.)
nvvp
uses nvprof
under the hood, in order to do all of its measurement work. However nvvp
may combined measured metrics in various interesting ways, e.g. to facilitate guided analysis.
nvprof
should not give you "wrong results", and if it did for some reason, then nvvp
should be equally susceptible to such errors.
Use of nvvp
vs. nvprof
may be simply a matter of taste or preference.
Many folks will like the convenenience of the GUI. And the nvvp
GUI offers a "Guided Analysis" mode which nvprof
does not. I'm sure there could be created an exhaustive list of other differences if you go through the documentation. But whatever nvvp
does, it does it using nvprof
. It doesn't have an alternate method to query the device for profiler data -- it uses nvprof
.
I would use nvprof
when it's inconvenient to use nvvp
, perhaps when I am running on a compute cluster node where it's difficult or impossible to launch nvvp
. You might also use it if you are doing targetted profiling (measuring a single metric, e.g. shared_replay_overhead
- nvprof
is certainly quicker than firing up the GUI and running a session), or if you are collecting metrics for tabular generation over a large series of runs.
In most other cases, I personally would use nvvp
. The timeline feature itself is hugely more convenient than trying to assemble a sequence in your head from the output of nvprof --print-gpu-trace ...
which is essentially the same info as the timeline.