When using the Windows-Machine-Learning library, the input and output to the onnx models is often either TensorFloat
or ImageFeatureValue
format.
My question: What is the difference between these? It seems like I am able to change the form of the input in the automatically created model.cs
file after onnx import (for body pose detection) from TensorFloat
to ImageFeatureValue
and the code still runs. This makes it e.g. easier to work with videoframes, since I can then create my input via ImageFeatureValue.CreateFromVideoFrame(frame)
.
Is there a reason why this might lead to problems and what are the differences between these when using videoframes as input, I don't see it from the documentation? Or why does the model.cs script create a TensorFloat
instead of an ImageFeatureValue
in the first place anyway if the input is a videoframe?
Found the answer here.
If Windows ML does not support your model's color format or pixel range, then you can implement conversions and tensorization. You'll create an NCHW four-dimensional tensor for 32-bit floats for your input value. See the Custom Tensorization Sample for an example of how to do this.