I'm using CUDA's WMMA API to multiply fragments on the GTX 1660 Ti. This GPU doesn't have Tensor Cores, but when I look at the SASS generated for my code I see HMMA.1688.F32 instructions, which are Tensor Core instructions! How can that happen?
Relevant information:
For code binary compatibility, the "non-tensor-core" members of the Turing family have hardware in the SM that will process tensor core instructions, albeit at a relatively low throughput, compared to a tensor core unit. This applies to any GPU variant (e.g. GeForce, Quadro) that is derived from or based on the TU116 or TU117 GPUs.