I was looking to leverage the tensor cores on my GPU for executing some CNN model inferences on it. Do frameworks like Pytorch or Tensorflow or MXNet or any of the frameworks for that matter, support inferencing on the tensor cores?
I've heard that tensor cores can be used for training purpose as Pytorch has an in-built support for it. Not sure if the same can be done for inferencing.
All frameworks can use tensor cores assuming 1. your GPU has tensor cores and 2. your model can actually take advantage of tensor cores (using mixed precision, all matmul sizes are a multiple of 8, etc).
For pytorch, you can read more here