cudansight

CUDA kernel launched from Nsight Compute gives inconsistent results


I have completed writing my CUDA kernel, and confirmed it runs as expected when I compile it using nvcc directly, by:

  1. Validating with test data over 100 runs (just in case)
  2. Using cuda-memcheck (memcheck, synccheck, racecheck, initcheck)

Yet, the results printed into the terminal while the application is getting profiled using Nsight Compute differs from run to run. I am curious if the difference is a cause for concern, or if this is the expected behavior.

Note: The application also gives correct & consistent results while getting profiled bu nvprof.


Solution

  • I was able to resolve the issue by addressing my shared memory initializations. Since Nsight Compute runs a kernel multiple times as @Jackson stated, the effects of uninitialized memory were amplified (I was performing atomicAdd into uninitialized memory).