IN CUDA PTX, there's a special register which holds a thread's warp's index: %warpid
. Now, the spec says:
Note that
%warpid
is volatile and returns the location of a thread at the moment when read, but its value may change during execution, e.g., due to rescheduling of threads following preemption.
Umm, what location is that? Shouldn't it be the location within the block, e.g. for a 1-dimensional grid %tid.x / warpSize
? Is it some slot-for-a-warp within the SM (e.g. warp scheduler or some internal queue)? I'm confused.
Motivation: I wanted to spare myself the trouble of calculating %tid.x / warpSize
as well as free up a register, by using this special register. However, in retrospect this is a false motivation, because reading a special register is expensive; see: What's the most efficient way to calculate the warp id / lane id in a 1-D grid?
You need to read the next 25 words of the documentation which directly follow after the quotation which you posted in your question:
For this reason, %ctaid and %tid should be used to compute a virtual warp index if such a value is needed in kernel code;
and then
%warpid is intended mainly to enable profiling and diagnostic code to sample and log information such as work place mapping and load distribution.
So no, you can't use it for what you want. %warpid
is effectively a scheduler slot ID rather than a constant, unique warp index within a block.