[SOLVED] faster_rcnn_r50 pretrained converted to ONNX hosted in Triton model server

I went through the mmdetection documentation to convert a pytorch model to onnx here link

All installations are correct and i'm using onnxruntime==1.8.1, custom operators for ONNX Runtime MMCV_WITH_OPS.

I'm using the configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py for faster rcnn link and using R-5-FPN pretrained model link

I used this to convert the pretrained model to onnx and successfully saved an onnx file named fasterrcnn.onnx

python tools/deployment/pytorch2onnx.py \
    configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
    checkpoints/faster_rcnn/faster_rcnn_r50_fpn_mstrain_3x_coco_20210524_110822-e10bd31c.pth \
    --output-file checkpoints/faster_rcnn/fasterrcnn.onnx \
    --input-img demo/demo.jpg \
    --test-img tests/data/color.jpg \
    --shape 608 608 \
    --dynamic-export \
    --cfg-options \
      model.test_cfg.deploy_nms_pre=-1 \

I am using that onnx file to host the model in NVIDIA triton model server.

 fasterrcnn_model | 1       | READY

The model summary of the onnx model from Triton is shown below

{
    "name": "fasterrcnn_model",
    "platform": "onnxruntime_onnx",
    "backend": "onnxruntime",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 1,
    "input": [
        {
            "name": "input",
            "data_type": "TYPE_FP32",
            "dims": [
                3,
                -1,
                -1
            ]
        }
    ],
    "output": [
        {
            "name": "labels",
            "data_type": "TYPE_INT64",
            "dims": [
                -1
            ]
        },
        {
            "name": "dets",
            "data_type": "TYPE_FP32",
            "dims": [
                -1,
                5
            ]
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
            "name": "fasterrcnn_model",
            "kind": "KIND_CPU",
            "count": 1,
            "gpus": [],
            "profile": []
        }
    ],
    "default_model_filename": "model.onnx",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": []
}

The summary outlines that the output has the categories "labels" and "dets"

After sending an inference request with a sample image to triton I am getting the following responses. labels

[[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
  18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
  36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
  54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71.
  72. 73. 74. 75. 76. 77. 78. 79.  0.  1.  2.  3.  4.  5.  6.  7.  8.  9.
  10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]]

dets

[[[-1.0000e+00 -1.0000e+00 -1.0000e+00 -1.0000e+00  0.0000e+00]
  [-3.0000e+02 -3.0000e+02 -3.0000e+02 -3.0000e+02  0.0000e+00]
  [-5.9900e+02 -5.9900e+02 -5.9900e+02 -5.9900e+02  0.0000e+00]
  [-8.9800e+02 -8.9800e+02 -8.9800e+02 -8.9800e+02  0.0000e+00]
  [-1.1970e+03 -1.1970e+03 -1.1970e+03 -1.1970e+03  0.0000e+00]
  [-1.4960e+03 -1.4960e+03 -1.4960e+03 -1.4960e+03  0.0000e+00]
  [-1.7950e+03 -1.7950e+03 -1.7950e+03 -1.7950e+03  0.0000e+00]
  [-2.0940e+03 -2.0940e+03 -2.0940e+03 -2.0940e+03  0.0000e+00]
  [-2.3930e+03 -2.3930e+03 -2.3930e+03 -2.3930e+03  0.0000e+00]
  [-2.6920e+03 -2.6920e+03 -2.6920e+03 -2.6920e+03  0.0000e+00]
  [-2.9910e+03 -2.9910e+03 -2.9910e+03 -2.9910e+03  0.0000e+00]
  [-3.2900e+03 -3.2900e+03 -3.2900e+03 -3.2900e+03  0.0000e+00]
  [-3.5890e+03 -3.5890e+03 -3.5890e+03 -3.5890e+03  0.0000e+00]
  [-3.8880e+03 -3.8880e+03 -3.8880e+03 -3.8880e+03  0.0000e+00]
  [-4.1870e+03 -4.1870e+03 -4.1870e+03 -4.1870e+03  0.0000e+00]
  [-4.4860e+03 -4.4860e+03 -4.4860e+03 -4.4860e+03  0.0000e+00]
  [-4.7850e+03 -4.7850e+03 -4.7850e+03 -4.7850e+03  0.0000e+00]
  [-5.0840e+03 -5.0840e+03 -5.0840e+03 -5.0840e+03  0.0000e+00]
  [-5.3830e+03 -5.3830e+03 -5.3830e+03 -5.3830e+03  0.0000e+00]
  [-5.6820e+03 -5.6820e+03 -5.6820e+03 -5.6820e+03  0.0000e+00]
  [-5.9810e+03 -5.9810e+03 -5.9810e+03 -5.9810e+03  0.0000e+00]
  [-6.2800e+03 -6.2800e+03 -6.2800e+03 -6.2800e+03  0.0000e+00]
  [-6.5790e+03 -6.5790e+03 -6.5790e+03 -6.5790e+03  0.0000e+00]
  [-6.8780e+03 -6.8780e+03 -6.8780e+03 -6.8780e+03  0.0000e+00]
  [-7.1770e+03 -7.1770e+03 -7.1770e+03 -7.1770e+03  0.0000e+00]
  [-7.4760e+03 -7.4760e+03 -7.4760e+03 -7.4760e+03  0.0000e+00]
  [-7.7750e+03 -7.7750e+03 -7.7750e+03 -7.7750e+03  0.0000e+00]
  [-8.0740e+03 -8.0740e+03 -8.0740e+03 -8.0740e+03  0.0000e+00]
  [-8.3730e+03 -8.3730e+03 -8.3730e+03 -8.3730e+03  0.0000e+00]
  [-8.6720e+03 -8.6720e+03 -8.6720e+03 -8.6720e+03  0.0000e+00]
  [-8.9710e+03 -8.9710e+03 -8.9710e+03 -8.9710e+03  0.0000e+00]
  [-9.2700e+03 -9.2700e+03 -9.2700e+03 -9.2700e+03  0.0000e+00]
  [-9.5690e+03 -9.5690e+03 -9.5690e+03 -9.5690e+03  0.0000e+00]
  [-9.8680e+03 -9.8680e+03 -9.8680e+03 -9.8680e+03  0.0000e+00]
  [-1.0167e+04 -1.0167e+04 -1.0167e+04 -1.0167e+04  0.0000e+00]
  [-1.0466e+04 -1.0466e+04 -1.0466e+04 -1.0466e+04  0.0000e+00]
  [-1.0765e+04 -1.0765e+04 -1.0765e+04 -1.0765e+04  0.0000e+00]
  [-1.1064e+04 -1.1064e+04 -1.1064e+04 -1.1064e+04  0.0000e+00]
  [-1.1363e+04 -1.1363e+04 -1.1363e+04 -1.1363e+04  0.0000e+00]
  [-1.1662e+04 -1.1662e+04 -1.1662e+04 -1.1662e+04  0.0000e+00]
  [-1.1961e+04 -1.1961e+04 -1.1961e+04 -1.1961e+04  0.0000e+00]
  [-1.2260e+04 -1.2260e+04 -1.2260e+04 -1.2260e+04  0.0000e+00]
  [-1.2559e+04 -1.2559e+04 -1.2559e+04 -1.2559e+04  0.0000e+00]
  [-1.2858e+04 -1.2858e+04 -1.2858e+04 -1.2858e+04  0.0000e+00]
  [-1.3157e+04 -1.3157e+04 -1.3157e+04 -1.3157e+04  0.0000e+00]
  [-1.3456e+04 -1.3456e+04 -1.3456e+04 -1.3456e+04  0.0000e+00]
  [-1.3755e+04 -1.3755e+04 -1.3755e+04 -1.3755e+04  0.0000e+00]
  [-1.4054e+04 -1.4054e+04 -1.4054e+04 -1.4054e+04  0.0000e+00]
  [-1.4353e+04 -1.4353e+04 -1.4353e+04 -1.4353e+04  0.0000e+00]
  [-1.4652e+04 -1.4652e+04 -1.4652e+04 -1.4652e+04  0.0000e+00]
  [-1.4951e+04 -1.4951e+04 -1.4951e+04 -1.4951e+04  0.0000e+00]
  [-1.5250e+04 -1.5250e+04 -1.5250e+04 -1.5250e+04  0.0000e+00]
  [-1.5549e+04 -1.5549e+04 -1.5549e+04 -1.5549e+04  0.0000e+00]
  [-1.5848e+04 -1.5848e+04 -1.5848e+04 -1.5848e+04  0.0000e+00]
  [-1.6147e+04 -1.6147e+04 -1.6147e+04 -1.6147e+04  0.0000e+00]
  [-1.6446e+04 -1.6446e+04 -1.6446e+04 -1.6446e+04  0.0000e+00]
  [-1.6745e+04 -1.6745e+04 -1.6745e+04 -1.6745e+04  0.0000e+00]
  [-1.7044e+04 -1.7044e+04 -1.7044e+04 -1.7044e+04  0.0000e+00]
  [-1.7343e+04 -1.7343e+04 -1.7343e+04 -1.7343e+04  0.0000e+00]
  [-1.7642e+04 -1.7642e+04 -1.7642e+04 -1.7642e+04  0.0000e+00]
  [-1.7941e+04 -1.7941e+04 -1.7941e+04 -1.7941e+04  0.0000e+00]
  [-1.8240e+04 -1.8240e+04 -1.8240e+04 -1.8240e+04  0.0000e+00]
  [-1.8539e+04 -1.8539e+04 -1.8539e+04 -1.8539e+04  0.0000e+00]
  [-1.8838e+04 -1.8838e+04 -1.8838e+04 -1.8838e+04  0.0000e+00]
  [-1.9137e+04 -1.9137e+04 -1.9137e+04 -1.9137e+04  0.0000e+00]
  [-1.9436e+04 -1.9436e+04 -1.9436e+04 -1.9436e+04  0.0000e+00]
  [-1.9735e+04 -1.9735e+04 -1.9735e+04 -1.9735e+04  0.0000e+00]
  [-2.0034e+04 -2.0034e+04 -2.0034e+04 -2.0034e+04  0.0000e+00]
  [-2.0333e+04 -2.0333e+04 -2.0333e+04 -2.0333e+04  0.0000e+00]
  [-2.0632e+04 -2.0632e+04 -2.0632e+04 -2.0632e+04  0.0000e+00]
  [-2.0931e+04 -2.0931e+04 -2.0931e+04 -2.0931e+04  0.0000e+00]
  [-2.1230e+04 -2.1230e+04 -2.1230e+04 -2.1230e+04  0.0000e+00]
  [-2.1529e+04 -2.1529e+04 -2.1529e+04 -2.1529e+04  0.0000e+00]
  [-2.1828e+04 -2.1828e+04 -2.1828e+04 -2.1828e+04  0.0000e+00]
  [-2.2127e+04 -2.2127e+04 -2.2127e+04 -2.2127e+04  0.0000e+00]
  [-2.2426e+04 -2.2426e+04 -2.2426e+04 -2.2426e+04  0.0000e+00]
  [-2.2725e+04 -2.2725e+04 -2.2725e+04 -2.2725e+04  0.0000e+00]
  [-2.3024e+04 -2.3024e+04 -2.3024e+04 -2.3024e+04  0.0000e+00]
  [-2.3323e+04 -2.3323e+04 -2.3323e+04 -2.3323e+04  0.0000e+00]
  [-2.3622e+04 -2.3622e+04 -2.3622e+04 -2.3622e+04  0.0000e+00]
  [-1.0000e+00 -1.0000e+00 -1.0000e+00 -1.0000e+00  0.0000e+00]
  [-3.0000e+02 -3.0000e+02 -3.0000e+02 -3.0000e+02  0.0000e+00]
  [-5.9900e+02 -5.9900e+02 -5.9900e+02 -5.9900e+02  0.0000e+00]
  [-8.9800e+02 -8.9800e+02 -8.9800e+02 -8.9800e+02  0.0000e+00]
  [-1.1970e+03 -1.1970e+03 -1.1970e+03 -1.1970e+03  0.0000e+00]
  [-1.4960e+03 -1.4960e+03 -1.4960e+03 -1.4960e+03  0.0000e+00]
  [-1.7950e+03 -1.7950e+03 -1.7950e+03 -1.7950e+03  0.0000e+00]
  [-2.0940e+03 -2.0940e+03 -2.0940e+03 -2.0940e+03  0.0000e+00]
  [-2.3930e+03 -2.3930e+03 -2.3930e+03 -2.3930e+03  0.0000e+00]
  [-2.6920e+03 -2.6920e+03 -2.6920e+03 -2.6920e+03  0.0000e+00]
  [-2.9910e+03 -2.9910e+03 -2.9910e+03 -2.9910e+03  0.0000e+00]
  [-3.2900e+03 -3.2900e+03 -3.2900e+03 -3.2900e+03  0.0000e+00]
  [-3.5890e+03 -3.5890e+03 -3.5890e+03 -3.5890e+03  0.0000e+00]
  [-3.8880e+03 -3.8880e+03 -3.8880e+03 -3.8880e+03  0.0000e+00]
  [-4.1870e+03 -4.1870e+03 -4.1870e+03 -4.1870e+03  0.0000e+00]
  [-4.4860e+03 -4.4860e+03 -4.4860e+03 -4.4860e+03  0.0000e+00]
  [-4.7850e+03 -4.7850e+03 -4.7850e+03 -4.7850e+03  0.0000e+00]
  [-5.0840e+03 -5.0840e+03 -5.0840e+03 -5.0840e+03  0.0000e+00]
  [-5.3830e+03 -5.3830e+03 -5.3830e+03 -5.3830e+03  0.0000e+00]
  [-5.6820e+03 -5.6820e+03 -5.6820e+03 -5.6820e+03  0.0000e+00]]]

The labels response looks like regular COCO classes (80) but I'm having a hard time decoding the dets response. Which looks like bounding boxes coordinates 4 and confidence threshold 1. Making the shape (1,100,5). Any idea on what the dets category is supposed to represent? The output usually depends on the model itself but I think the onnx conversion is changing the output to say labels and dets