cudaptx

In CUDA PTX, what does %warpid mean, really?


IN CUDA PTX, there's a special register which holds a thread's warp's index: %warpid. Now, the spec says:

Note that %warpid is volatile and returns the location of a thread at the moment when read, but its value may change during execution, e.g., due to rescheduling of threads following preemption.

Umm, what location is that? Shouldn't it be the location within the block, e.g. for a 1-dimensional grid %tid.x / warpSize? Is it some slot-for-a-warp within the SM (e.g. warp scheduler or some internal queue)? I'm confused.

Motivation: I wanted to spare myself the trouble of calculating %tid.x / warpSize as well as free up a register, by using this special register. However, in retrospect this is a false motivation, because reading a special register is expensive; see: What's the most efficient way to calculate the warp id / lane id in a 1-D grid?


Solution

  • You need to read the next 25 words of the documentation which directly follow after the quotation which you posted in your question:

    For this reason, %ctaid and %tid should be used to compute a virtual warp index if such a value is needed in kernel code;

    and then

    %warpid is intended mainly to enable profiling and diagnostic code to sample and log information such as work place mapping and load distribution.

    So no, you can't use it for what you want. %warpid is effectively a scheduler slot ID rather than a constant, unique warp index within a block.