cudansight-compute

Interpreting compute workload analysis in Nsight Compute


Compute Workload Analysis displays the utilization of different compute pipelines. I know that in a modern GPU, integer and floating point pipelines are different hardware units and can execute in parallel. However, it is not very clear which pipeline represents which hardware unit for the other pipelines. I also couldn't find any documentation online about abbreviations and interpretations of the pipelines.

My questions are:

1) What are the full names of ADU, CBU, TEX, XU? How do they map to the hardware?

2) Which of the pipelines utilize the same hardware unit(e.g. FP16, FMA, FP64 uses floating point unit)?

3) A warp scheduler in a modern GPU can schedule 2 instructions per cycle(using different pipelines). Which pipelines can be used at the same time(e.g FMA-ALU, FMA-SFU, ALU-Tensor etc.)?

P.s.: I am adding the screenshot for those who are not familiar with Nsight Compute.enter image description here


Solution

  • The Volta (CC 7.0) and Turing (CC 7.5) SM is comprised of 4 sub-partitions (SMSP). Each sub-partition contains

    The contains several other partitions that contains execution units and resources shared by the 4 sub-partitions including

    In Volta (CC7.0, 7.2) and Turing (CC7.5) each SM sub-partition can issue 1 instruction per cycle. The instruction can be issued to a local execution unit or the SM shared execution units.