[SOLVED] Profile debug or release cuda code?

Profile debug or release cuda code?

I have been profiling an application with nvprof and nvvp (5.5) in order to optimize it. However, I get totally different results for some metrics/events like inst_replay_overhead, ipc or branch_efficiency, etc. when I'm profiling the debug (-G) and release version of the code.

so my question is: which version should I profile? The release or debug version? Or the choice depends upon what I'm looking for?

I found CUDA - Visual Profiler and Control Flow Divergence where is stated that a debug (-G) version is needed to properly measure the divergent branches metric, but I am not sure about other metrics.

Solution

Profiling usually implies that you care about performance.

If you care about performance, you should profile the release version of a CUDA code.

The debug version (-G) will generate different code, which usually runs slower. There's little point in doing performance analysis (including execution time measurement, benchmarking, profiling, etc.) on a debug version of a CUDA code, in my opinion, for this reason.

The -G switch turns off most optimizations that the device code compiler might ordinarily make, which has a large effect on code generation and also often a large effect on performance. The reason for the disabling of optimizations is to facilitate debug of code, which is the primary reason for the -G switch and for a debug version of your code.