swiftuirealitykitvisionosapple-visioncreateml

Object recognition using Apple Vision Pro


My goal is to integrate a mlmodel into the Apple Vision Pro environment. However, I have encountered challenges in finding appropriate code to achieve this integration.

I would like to share my progress so far. My primary objective is to detect the presence of a real phone in the real world using Apple Vision Pro. This involves accessing the Vision Pro camera to capture video input, process it with the model, and then determine the appropriate actions based on the detection results.

Any guidance or suggestions on how to effectively implement this would be greatly appreciated.

//class image recognition

import Foundation
import CoreML
import Vision
import SwiftUI

class ImageRecognitionHandler: ObservableObject {
    @Published var recognizedObjects: [String] = []
    private var model: VNCoreMLModel

    init() {
        
        do {
            let configuration = MLModelConfiguration()
            let phoneRecognitionModel = try PhoneRecognition1(configuration: configuration)
            model = try VNCoreMLModel(for: phoneRecognitionModel.model)
        } catch {
            fatalError("Failed to load CoreML model: \(error.localizedDescription)")
        }
    }

    func recognizeImage(_ image: UIImage) {
            let request = VNCoreMLRequest(model: model) { request, error in
                guard let results = request.results as? [VNRecognizedObjectObservation], error == nil else {
                    print("Failed to perform image recognition: \(error?.localizedDescription ?? "Unknown error")")
                    return
                }

                let recognizedObjectIdentifiers = results.compactMap { $0.labels.first?.identifier }
                if recognizedObjectIdentifiers.contains("phone") {
                    print("phone")
                } else {
                    print("no phone")
                }
            }

            guard let cgImage = image.cgImage else { return }
            let handler = VNImageRequestHandler(cgImage: cgImage)
            do {
                try handler.perform([request])
            } catch {
                print("Failed to perform request: \(error.localizedDescription)")
            }
        }}

//main 
import SwiftUI
import Vision


@main
struct TestVisionTrackerApp: App {
    var body: some Scene {
        WindowGroup {
            ContentView()   
        }
    }
}

//contentView 
import SwiftUI
import RealityKit
import AVFoundation


struct ContentView: View {
    @StateObject private var imageRecognitionHandler = ImageRecognitionHandler()

    var body: some View {
        VStack {
            Button("Recognize Image") {
                
                
                if let videoDevice =  AVCaptureDevice.authorizationStatus(for: AVMediaType.video) == .authorized {
                    imageRecognitionHandler.recognizeImage(.????) //i don't know what to put in here
                } else {
                    print("Failed to load image")
                }
            }
            .padding()
        }
    }
}

#Preview(windowStyle: .automatic) {
    ContentView()
}

Solution

  • Object Tracking in visionOS 2.0 and iOS 18.0

    In visionOS 2.0, Apple introduced the ability to track real-world objects, allowing similar functionality presented in ARKit for iOS when using ARObjectAnchor. To implement this, you'll need macOS 15+ Sequoia and Xcode 16. Using RealityKit's Photogrammetry API, create a USDZ model of your phone. Then in CreateML app, create an Object Tracking template and place your 3D model at the origin of the XYZ coordinates. Create a .referenceobject file. Based on the data in this file, you now can generate a trackable ObjectAnchor.


    enter image description here

    Here's the code:

    import SwiftUI
    import RealityKit
    import RealityKitContent
    
    struct ContentView : View {
        let rkcb = realityKitContentBundle
        let url = Bundle.main.url(forResource: "iPhoneX",
                                withExtension: ".referenceobject")!
    
        var body: some View {
            RealityView { rvc in
                let description = try! await Entity(named: "forPhone", in: rkcb)
                
                let anchor = AnchorEntity(.referenceObject(from: .init(url)),
                                                   trackingMode: .predicted)
                anchor.addChild(description)
                rvc.add(anchor)
            }
        }
    }