unity-game-enginepytorchonnxonnxruntimebarracuda

Unexpected model output running Onnx model in Unity using Barracuda


Context

I am trying to use a pre-trained model in ONNX format to do inference on image data in Unity. The model is linked to the executing component in Unity as an asset called modelAsset. I am using Barracuda version 1.0.0 for this and executing the model as follows:

// Initialisation        
this.model = ModelLoader.Load(this.modelAsset);
this.worker = WorkerFactory.CreateWorker(WorkerFactory.Type.CSharpBurst, model);

// Loop
Tensor tensor = new Tensor(1, IMAGE_H, IMAGE_W, 3, data);        
worker.Execute(tensor);
Tensor modelOutput = worker.PeekOutput(OUTPUT_NAME);

The data going into the input tensor (of which the model has only 1) is image data of h * w with 3 channels for RGB values between -0.5 and 0.5. The model has multiple outputs which I retrieve in the last line shown above.

Expected behavior

Using the same input data, the PyTorch model and converted ONNX model produce the same output data in Python (ONNXRuntime and PyTorch) as in Barracuda in Unity.

Problem

In python both the ONNX and PyTorch model produce the same output. However, the same ONNX model running in Barracuda produces a different output. The difference is mainly that we expect a heatmap but Barracuda consistently produces values somewhere between 0.001 and -0.0004 in these patterns: Data patterns Barracuda output

This makes it almost seem like the model weights are not properly loaded.

What we found

When converting to ONNX as per the Barracuda manual we found that if we did not set the model to inference mode in the PyTorch net before conversion (link), these same, incorrect, results were generated by ONNXRuntime in Python. In other words, it looks like this inference mode is saved in the ONNX model and is recognized by ONNXRuntime in Python but not in Barracuda.

Our question

In general:

And potentially:


Solution

  • So it turned out that there were 2 problems. First, the input data had been orchestrated according to the ONNX model dimensions, however, Barracuda expects differently oriented data. "The native ONNX data layout is NCHW, or channels-first. Barracuda automatically converts ONNX models to NHWC layout." So our data was flattened into an array similar to the Python implementation which created the first mismatch.

    Secondly, the Y-axis of the input image was inverted, making the model unable to recognize any people.

    After correcting for these issues, the implementation works fine!