cudagpunvidiahardware

How am I able to run Tensor Core instructions without actually having Tensor Cores?


I'm using CUDA's WMMA API to multiply fragments on the GTX 1660 Ti. This GPU doesn't have Tensor Cores, but when I look at the SASS generated for my code I see HMMA.1688.F32 instructions, which are Tensor Core instructions! How can that happen?

Relevant information:


Solution

  • For code binary compatibility, the "non-tensor-core" members of the Turing family have hardware in the SM that will process tensor core instructions, albeit at a relatively low throughput, compared to a tensor core unit. This applies to any GPU variant (e.g. GeForce, Quadro) that is derived from or based on the TU116 or TU117 GPUs.