I have been profiling an application with nvprof and nvvp (5.5)
in order to optimize it. However, I get totally different results for some metrics/events like inst_replay_overhead
, ipc
or branch_efficiency
, etc. when I'm profiling the debug (-G
) and release version of the code.
so my question is: which version should I profile? The release or debug version? Or the choice depends upon what I'm looking for?
I found CUDA - Visual Profiler and Control Flow Divergence where is stated that a debug (-G
) version is needed to properly measure the divergent branches metric, but I am not sure about other metrics.
Profiling usually implies that you care about performance.
If you care about performance, you should profile the release version of a CUDA code.
The debug version (-G) will generate different code, which usually runs slower. There's little point in doing performance analysis (including execution time measurement, benchmarking, profiling, etc.) on a debug version of a CUDA code, in my opinion, for this reason.
The -G switch turns off most optimizations that the device code compiler might ordinarily make, which has a large effect on code generation and also often a large effect on performance. The reason for the disabling of optimizations is to facilitate debug of code, which is the primary reason for the -G switch and for a debug version of your code.