I've read ARKit official tutorial RealtimeNumberReader, it uses AVCaptureSession and a specific function layerRectConverted which is only for AVCaptureSession to convert coordinates from bounding box to screen coordinate.
let rect = layer.layerRectConverted(fromMetadataOutputRect: box.applying(self.visionToAVFTransform))
Now I want to recognize text on ARFrame's capturedImage and then display the bound box on screen. Is it possible?
I know how to recognize text on a single image from official tutorial, my problem is how to convert the normalized box coordinate to viewport coordinate.
Please help and thank you very much!!!
Try looking at this git repo. Having messed with it myself it is not the most performant but this should give you a start.