onnxonnxruntime

Debugging "CUDA kernel not found in registries for Op type" in onnxruntime


My model runs much slower in onnx than in torch. During the session initialization, I get some of these messages.

 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal_1
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: ConstantOfShape node name: /ConstantOfShape_2
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal_2
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal_3
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Resize node name: /image_encoder/image_encoder.1/Resize
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Resize node name: /image_encoder/image_encoder.1/Resize_1
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Resize node name: /image_encoder/image_encoder.1/Resize_2
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: GridSample node name: /GridSample
 [I:onnxruntime:, cuda_execution_provider.cc:2517 GetCapability] CUDA kernel not found in registries for Op type: Equal node name: /Equal_4

I'm wondering if this might be the cause. Does it mean that the operations Equal, Resize, and GridSample are being executed on the CPU? If so, how can I debug this? Looking at https://github.com/microsoft/onnxruntime/blob/rel-1.20.0/docs/OperatorKernels.md it looks like all these kernels should be implemented for the CUDA execution provider. My onnxruntime version is 1.20.1.


Solution

  • The issue had to do with the operator versioning. I exported with opset=20, which caused GridSample to be exported at version=20, for example. However, the CUDA provider has only implemented it for version=16+. Re-exporting at opset=17 fixed the issue.