pytorchonnxtritonserver

ONNX Runtime: io_binding.bind_input causing "no data transfer from DeviceType:1 to DeviceType:0"


I am using Nvidia Triton Inference Server and ONNX model for inference on a GPU instance. The Dockerfile, containing the environment, inference server and models contains following from/pip lines:

FROM --platform=linux/amd64 nvcr.io/nvidia/tritonserver:23.12-py3

RUN pip install torch transformers onnx onnxruntime-gpu onnxruntime

the model.py for the Triton Inference Server has been simplified to following:

import onnxruntime as ort
import torch
import numpy as np

session = ort.InferenceSession("path/to/onnx.model", providers=["CUDAExecutionProvider", "CPUExecutionProvider"])

...

io_binding = session.io_binding()
pt_script_embeddings = torch.rand(
    size=(100, 2010), dtype=torch.float32, device="cuda:0"
).contiguous()

io_binding.bind_input(
    name="np_script_embeddings",
    device_type="cuda",
    device_id=0,
    element_type=np.float32,
    shape=tuple(pt_script_embeddings.shape),
    buffer_ptr=pt_script_embeddings.data_ptr(),
)

logit_output_shape = (100, 2)
logit_output = torch.empty(logit_output_shape, dtype=torch.float32, device='cuda:0').contiguous()
io_binding.bind_output(
    name="np_logits",
    device_type="cuda",
    device_id=0,
    element_type=np.float32,
    shape=tuple(logit_output.shape),
    buffer_ptr=logit_output.data_ptr()
)

session.run_with_iobinding(io_binding)
outputs = logit_output.cpu().numpy()

Unfortunately, the error below is triggered at the line io_binding.bind_input causing me a lot of grief:

RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]

Note: articles reviewed before the SO submission:


Solution

  • To resolve the issue I needed to carefully match versions ofcuda, pytorch and onnxruntime provided by the tritonserver docker image with the Python packages of torch and onnxruntime-gpu installed manually. Here is the process in details:

    Base on the collected versions, update the environment. In my case it is the Docker image with following changes:

    FROM --platform=linux/amd64 nvcr.io/nvidia/tritonserver:23.10-py3
    
    RUN pip install transformers
    RUN pip install torch==2.1
    
    # https://onnxruntime.ai/docs/install/
    # https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements
    RUN pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
    

    NOTE: if your build environment has no access to the Azure repo: https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ then retrieve and install the files manually from: https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/onnxruntime-cuda-12 (make sure to correct cuda-12 for your version)