cudadoublekepler

On Double Precision Units (DPUs) on Kepler K20Xm


According to the Kepler architecture whitepaper, a SMX has 192 CUDA cores and 64 Double Precision Units (DPUs). For a K20Xm there are 14 SMXs totalling at 2688 cores, which means that only the CUDA cores are counted. What exactly is then the usage of the DPUs for and how is their usage related to the cores?

My thoughts:

a) The CUDA cores can't do double precision operations and only the DPUs can. Therefore, the CUDA cores are free for other stuff while the DPUs are busy.

b) The CUDA cores somehow need a double precision unit to do double precision operations, therefore only 128 of the 192 CUDA cores are available for other stuff.

Cheers Andi


Solution

  • The double precision units are actually separate hardware floating point units that do double precision arithmetic. They are independent from the "cuda cores", which roughly speaking, could be considered to be the single-precision units.

    So for single precision arithmetic, the throughput can be computed based on the "cuda cores" or single precision units. For double precision arithmetic, the throughput must be computed based on the double precision units.

    In a Kepler K20 SMX, the ratio of double-precision units to single precision units is 1:3. Therefore the throughput for each type of arithmetic follows the same ratio. By "arithmetic" I mean here floating point multiply or floating point add.