I'm trying to build an object detection model with Create ML. In addition to detecting the type of objects in a picture, I would like it to give me in the output the coordinates (or the position) of each detected object.
How can I get the coordinates of each detected object in the output picture?
Is it possible to do that with Create ML? And if yes, how?
Yes, the object detector returns an array of VNRecognizedObjectObservation
objects of what it finds. Each object contains an array of matching labels, a confidence value and a bounding box as well as other information.
The bounding box is normalized, so you need to convert it into pixel coordinates with VNImageRectForNormalizedRect
before you use it.
Apple has an excellent sample app with detailed explanations which you can find here.