c++onnxruntime

Consuming an ML.NET trained model from the C++ OnnxRuntime


I wrote a C# program to train an multi-classification model using Microsoft ML.NET. The training is successfully complete and I have exported the model as an ONNX file using the Microsoft.ML.OnnxConverter package.

I would like to consume the ONNX model from within a C++ program (running on x64-windows) for running the inference (prediction task).

The shape of the input and output in my model is:

Input:
  Features:              float 1x7
  code_point:            float 1x1
Output:
  Features.output:       float 1x7
  code_point.output:     float 1x1
  PredictedLabel.output: float 1x1
  Score.output:          float 1x94

Note: The code_point is uint32_t datatype as noted in the answer. I am leaving the question as is with this note included.

In the code for invoking the inference,

    constexpr size_t input_tensor_size = 8;
    std::vector<float> input_tensor_values(input_tensor_size);

    // initialize the input_tensor_values
    ...

    // create input tensor object from data values
    auto memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
    auto input_tensor = Ort::Value::CreateTensor<float>(
        memory_info, 
        input_tensor_values.data(), input_tensor_size,
        input_node_dims.data(), input_node_dims.size());

    std::vector<const char*> output_node_names = { 
        "Features.output", "code_point.output", 
        "PredictedLabel.output", "Score.output" 
    };

    // score model & input tensor, get back output tensor
    auto output_tensors =
        session.Run(
          Ort::RunOptions{ nullptr }, 
          input_node_names.data(), 
          &input_tensor, 1, 
          output_node_names.data(), 1);

I am getting an access violation error upon invoking session.Run() and I am not able to figure out what the cause is. I suspect it has to do with either the input tensor being flattened into a 1x8 vector and passed to the function or the output_node_names shape length being passed as 1. I have tried setting that to 4 and that doesn't work either.

Could you please suggest the right sequence for initializing the tensors and calling the Run() function for the shape of the input/output given above?


Solution

  • I found out the mistake after @Botje pointed out the issue in a comment under the question.

    First of all, there is a small error in the model. code_point is uint32_t datatype and not float. The correct model is

    Input:
       Features               float    1x7,
       code_point             uint32_t 1x1
    Output:
       Features               float    1x7,
       code_point.output      uint32_t 1x1,
       PredictedLabel.output  uint32_t 1x1,
       Score.output           float    1x94
    

    Secondly, as @Botje pointed out, there are two inputs to the model viz., Features and code_point.

    I created simple classes to hold the model input and output and pass it around:

    struct model_input
    {
    public:
        std::vector<float>  features;
        uint32_t            code_point;
    public:
        model_input()
        {
            features.resize(7, 0.0f);
            code_point = 0u;
        }
    };
    
    struct model_output
    {
    public:
        std::vector<float>  features;
        uint32_t            code_point;
        uint32_t            PredictedLabel;
        std::vector<float>  Score;
    public:
        model_output()
        {
            features.resize(7, 0.0f);
            code_point = 0u;
            PredictedLabel = 0u;
            Score.resize(94, 0.0f);
        }
    
    };
    

    The working sequence for the initialization and inference is as follows:

        // copy the test input values into "Features"
        model_input mdl_input;
        mdl_input.features = {
            0.204244,
            0.0475028,
            -0.00872255,
            -0.0037717,
            -0.0122744,
            0.0262117,
            -0.000971803
        };
        mdl_input.code_point = 44u;
        model_output mdl_output;
    
        Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
    
        std::vector<Ort::Value> inputTensors;
    
        //Features is float:7
        inputTensors.push_back(Ort::Value::CreateTensor<float>(memoryInfo, mdl_input.features.data(), mdl_input.features.size(), inputDims[0].data(), inputDims[0].size()));
    
        //code_point is uint32_t:1
        inputTensors.push_back(Ort::Value::CreateTensor<uint32_t>(memoryInfo, &mdl_input.code_point, 1, inputDims[1].data(), inputDims[1].size()));
    
        std::vector<Ort::Value> outputTensors;
    
        // Features.output is float:7
        outputTensors.push_back(Ort::Value::CreateTensor<float>(memoryInfo, mdl_output.features.data(), mdl_output.features.size(), outputDims[0].data(), outputDims[0].size()));
    
        // code_point.output is uint32_t:1
        outputTensors.push_back(Ort::Value::CreateTensor<uint32_t>(memoryInfo, &mdl_output.code_point, 1, outputDims[1].data(), outputDims[1].size()));
    
        // PredictedLabel is uint32_t:1
        outputTensors.push_back(Ort::Value::CreateTensor<uint32_t>(memoryInfo, &mdl_output.PredictedLabel, 1, outputDims[2].data(), outputDims[2].size()));
    
        // Score is float:94
        outputTensors.push_back(Ort::Value::CreateTensor<float>(memoryInfo, mdl_output.Score.data(), mdl_output.Score.size(), outputDims[3].data(), outputDims[3].size()));
    
        // names are hard-coded!
        std::vector<const char*> input_names_ptrs =
        {
            "Features",
            "code_point"
        };
    
        std::vector<const char*> output_names_ptrs =
        {
            "Features.output",
            "code_point.output",
            "PredictedLabel.output",
            "Score.output"
        };
    
        session.Run(
            Ort::RunOptions{ nullptr }, 
            input_names_ptrs.data(),
            inputTensors.data(),
            inputTensors.size(),  //Number of inputs 
            output_names_ptrs.data(),
            outputTensors.data(),
            outputTensors.size()   //Number of outputs
        );
    
        std::cout << "expected: " << mdl_input.code_point << ", predicted: " << mdl_output.code_point << std::endl;
    

    After fixing this, the program generated the output:

    expected: 44, predicted: 44