cudanvprof

Do the SM's shown in the "occupancy graph" correspond to `blockIdx.x` or register `%smid`?


Do the SM's shown in the "occupancy graph" correspond to blockIdx.x or register %smid?

Here's an example of such a graph

enter image description here

And here's some sample output from when I print the blockIdx.x as the "logical" block, and print register %smid (accessed via assembly) as the physical block.

running on logical Block 77 and Physical SM 75
running on logical Block 31 and Physical SM 62
running on logical Block 37 and Physical SM 74
running on logical Block 74 and Physical SM 69
running on logical Block 66 and Physical SM 53
running on logical Block 45 and Physical SM 11
running on logical Block 43 and Physical SM 7
<snip>

Additionally, I can kind of predict how long each block will take to execute, and blocks ~30 and ~31 aren't predicted to take a long time -- even though they show up as taking a long time on the graph. And some blocks that I "predict" to take a "short amount of time", take longer than a "short time", as shown on the graph. Thus, there seems to be a mismatch between the logical block and duration taken, as shown in the graph.

Finally, given the term "SM" is used in the graph, I would expect the graph to represent the physical SMID, not the logical block number.

Given these three pieces of evidence, I suspect that the number shown in the graph corresponds to the register %smid.

All I'm looking for is confirmation. Once I fully understand the issue, I'll be justified in either better distributing the work across the existing 80 blocks, or just using more blocks to begin with. Thanks!


Solution

  • Quoting Robert Crovella:

    The horizontal axis of the graph corresponds to the register %smid. SM = Streaming Multiprocessor. You could quickly confirm this by running more than 80 blocks and observing that the presented graph does not proceed beyond 0..79 on the horizontal axis.