iosswiftmultidimensional-arraycoremlmicrosoft-custom-vision

Working with a Multidimensional Array as a CoreML Model output


I have trained an object detection CoreML model using Microsoft's customvision.ai service. I exported it to use in my app to recognize certain objects in real time using the camera. However the CoreML model outputs a MultiArray of type double. I have no idea how to decipher or use this data as it is my first time working with Multidimensional Arrays. I have been trying to find out what a custom vision object detection model is supposed to output (such as a CGRect or a UIImage) so I know what I am trying to convert my MultiArray to, but cannot find this information anywhere on Microsofts's website. Microsoft seems to have a demo app for image classification models but nothing for object detection models.

To get a sense of what might be in the multidimensional array I have tried printing it out and get this result...

Double 1 x 1 x 40 x 13 x 13 array

I have also tried printing the .strides element of the multidimensional array and got this...

[6760, 6760, 169, 13, 1]

I don't know if this info is actually useful, just wanted to give you guys everything I have done so far.

So, my question is what information does this MultiArray hold (is it something like a UIImage or CGRect?, or something different?) and how can I convert this Multidimensional Array into a useful set of data that I can actually use?


Solution

  • 9 months later, I stumbled upon your question while trying to solve this exact problem. Having found the solution today, I thought I'd post it up.

    Have a look at this github sample.

    https://github.com/Azure-Samples/cognitive-services-ios-customvision-sample/tree/master/CVS_ObjectDetectorSample_Swift

    It makes use of a Cocoapod named MicrosoftCustomVisionMobile.

    That cocoapod contains the CVSInference framework, which has a class, CVSObjectDetector, that will do all the heavy lifting of parsing the 3-dimensional MLMultiArray output for you. All you need to do is feed it the UIImage for detection and run the inference. Then, you can read the detected identifiers, their bounding boxes and confidences using the strongly typed properties of CVSObjectDetector. Make sure you transform the coordinates back to your view space before drawing!

    If you are working in Xamarin like me, you could use sharpie to create a C# binding for the pod and you'll be in business.