c++cudagpgpugpu-warpthread-divergence

Monitor active warps and threads during a divergent CUDA run


I implemented some CUDA code. It runs fine but the alogrithm inherently produces a strong thread divergence. This is expected.

I will later try to reduce divergence. But for the moment I would be happy to be able to measure it.

Is there an easy way (prefereably using a runtime API call or a CLI tool) to check how many of my initially scheduled warps and/or threads are still active?


Solution

  • I found a solution that gives me pretty nice results. Calling the following function from some lines of a kernel (and adapted using a proper filter condition) prints the number of active threads of the current warp:

    __device__ void printConvergentThreadCount(int line) // Pass __LINE__
    {
       const int count = __popc(__activemask());
       const int threadId = blockIdx.x * blockDim.x + threadIdx.x;
       if (threadId == 0) // Filter
       {
          printf("Line %i: %i\n", line, count);
       }
    }
    

    Still this doesn't give numbers as long as kernels are running.