Async function runs in background on simulator but not on physical phone

I have an asynchronous function I am trying to run which uses the Vision framework to scan for text in an image. In the parent view I call this function within a Task { }, which works as expected on the simulator - the UI is responsive and the output text is updated when the function is complete. However, running the same code on my physical device (iPhone 13 Pro), the UI freezes when this function is being run and only resumes when the function completes. I understand that I should always trust the behavior on my phone, not my simulator, so what is wrong with my code? Thanks in advance!

The code to my function (iOS 17.5, XCode 15.4):

func recognizeText(from image: UIImage) async {
        DispatchQueue.main.async {
            self.isLoading = true
        }
        guard let cgImage = image.cgImage else {
            self.isLoading = false
            return
        }
        
        let request = VNRecognizeTextRequest { [weak self] request, error in
            guard let self = self else { return }
            guard let observations = request.results as? [VNRecognizedTextObservation], error == nil else {
                self.alertItem = AlertContext.invalidOCR
                self.isLoading = false
                return
            }

            let text = observations.compactMap { $0.topCandidates(1).first?.string }.joined(separator: "\n")
            DispatchQueue.main.async {
                self.recognizedText = text.isEmpty ? "No recognized texts. Please try again." : text
                self.isLoading = false
                
            }
        }
        request.recognitionLevel = .accurate

        let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
        DispatchQueue.global(qos: .userInitiated).async {
            try? requestHandler.perform([request])
        }
    }

Update #1: This is how I call the function from the view - my camera view captures a photo which binds to the viewModel's takenPhoto, and I call the function when takenPhoto changes:

.onChange(of: viewModel.takenPhoto) {
                if let photo = viewModel.takenPhoto {
                    Task {
                        await viewModel.recognizeText(from: photo)
                    }
                }
            }

Solution

I do not believe that the problem is that it “runs in background [queue] on simulator but not on physical phone.” I think that the problem is that the work on this relatively high-priority background queue imposes a sufficient load on the device that it can interfere with the UI running on the main thread.

That having been said, a few observations:

You are using a QoS of .userInitiated. You may find that using that when doing something so computationally intensive, that can introduce micro hangs in the UI. I might suggest using .utility, which has a negligible impact on performance, but minimizes these issues.
You are correct that you should always test this on a device rather than relying solely on the performance you experience on the simulator. I would further recommend that you test performance on an optimized “release” build rather than a “debug” build. In my tests, I was seeing micro hangs in my UI with a debug build, but when I shifted to a release build, those disappeared.

In general, one is well advised to avoid using GCD in conjunction with Swift concurrency. But, when calling something slow and synchronous (such as perform), one must keep that out of the Swift concurrency cooperative thread pool. We have a contract with Swift concurrency to never impede “forward progress” on this thread pool (because it is so limited). Back in WWDC 2022 video Visualize and optimize Swift concurrency Apple explicitly suggested keeping that sort of work in GCD and bridging it back to Swift concurrency using withCheckedContinuation (and its throwing and/or unsafe brethren).

Nowadays, rather than introducing a lot of GCD code in our codebase, we might use an actor with a custom executor to keep it out of the Swift concurrency cooperative thread pool. So we would retire most of the GCD API, but do supply a custom GCD queue for your actor’s executor:

actor TextRecognizer {
    private let queue = DispatchSerialQueue(label: Bundle.main.bundleIdentifier! + ".TextRecognizer", qos: .utility)

    nonisolated var unownedExecutor: UnownedSerialExecutor {
        queue.asUnownedSerialExecutor()
    }

    func text(from cgImage: CGImage) async throws -> String {
        try await withCheckedThrowingContinuation { (continuation: CheckedContinuation<String, Error>) in
            do {
                let request = VNRecognizeTextRequest { request, error in
                    guard
                        let observations = request.results as? [VNRecognizedTextObservation],
                        error == nil
                    else {
                        continuation.resume(throwing: error ?? AlertContext.invalidOCR)
                        return
                    }

                    let text = observations
                        .compactMap { $0.topCandidates(1).first?.string }
                        .joined(separator: "\n")

                    guard !text.isEmpty else {
                        continuation.resume(throwing: AlertContext.noRecognizedText)
                        return
                    }

                    continuation.resume(returning: text)
                }
                request.recognitionLevel = .accurate

                let requestHandler = VNImageRequestHandler(cgImage: cgImage)

                // to confirm that this is running on the correct queue, you could add a precondition:
                //
                // dispatchPrecondition(condition: .onQueue(queue))

                try requestHandler.perform([request])
            } catch {
                continuation.resume(throwing: error)
            }
        }
    }
}

FWIW, here is my full MRE:

import SwiftUI
import Observation
import Vision
import os.log

struct ContentView: View {
    @State var viewModel = ViewModel()

    var body: some View {
        VStack(spacing: 16) {
            Image(systemName: "text.rectangle.page")
                .imageScale(.large)
                .foregroundStyle(.tint)

            Text("Image Processor")

            // show recognized text, if any

            if let recognizedText = viewModel.recognizedText, !recognizedText.isEmpty {
                Text(viewModel.recognizedText ?? "")
                    .lineLimit(5)
            }

            // show elapsed time if not zero

            if viewModel.elapsed != .zero {
                Text("\(viewModel.elapsed.seconds, specifier: "%0.2f") seconds")
                    .monospacedDigit()
            }

            // show spinner if loading

            if viewModel.isLoading {
                ProgressView()
                    .progressViewStyle(CircularProgressViewStyle())
            }

            // show error, if any

            if let error = viewModel.alertItem {
                Text(error.localizedDescription)
                    .foregroundStyle(.red)
            }

            // button to start recognition

            Button("Start") {
                Task {
                    let image = UIImage(named: "snapshot")
                    await viewModel.recognizeText(from: image)
                }
            }
        }
        .padding()
    }
}

@Observable
@MainActor
class ViewModel {
    var isLoading = false
    var alertItem: Error?
    var recognizedText: String?
    var elapsed: ContinuousClock.Duration = .zero

    private let logger = Logger(subsystem: Bundle.main.bundleIdentifier!, category: "ViewModel")

    // In Xcode 15.4, you may need to explicitly add a `nonisolated` initializer, so uncomment the following
    //
    // nonisolated init() { }

    func startTimer() async {
        let start = ContinuousClock().now
        elapsed = .zero

        while !Task.isCancelled {
            try? await Task.sleep(for: .milliseconds(10))
            elapsed = .now - start
        }
    }

    func recognizeText(from image: UIImage?) async {
        guard let image else {
            alertItem = AlertContext.imageNotFound
            return
        }

        alertItem = nil

        guard let cgImage = image.cgImage else {
            alertItem = AlertContext.imageNotFound
            isLoading = false
            return
        }

        let timerTask = Task { await startTimer() }

        do {
            isLoading = true
            defer {
                isLoading = false
                timerTask.cancel()
            }

            let recognizer = TextRecognizer()
            recognizedText = try await recognizer.text(from: cgImage)
        } catch {
            logger.error("\(error)")
            alertItem = error
        }
    }
}

actor TextRecognizer {
    private let queue = DispatchSerialQueue(label: Bundle.main.bundleIdentifier! + ".TextRecognizer", qos: .utility)
    let poi = OSSignposter(subsystem: "TextRecognizer", category: .pointsOfInterest)

    nonisolated var unownedExecutor: UnownedSerialExecutor {
        queue.asUnownedSerialExecutor()
    }

    func text(from cgImage: CGImage) async throws -> String {
        let state = poi.beginInterval(#function, id: poi.makeSignpostID())
        defer { poi.endInterval(#function, state) }

        return try await withCheckedThrowingContinuation { (continuation: CheckedContinuation<String, Error>) in
            do {
                let request = VNRecognizeTextRequest { request, error in
                    guard
                        let observations = request.results as? [VNRecognizedTextObservation],
                        error == nil
                    else {
                        continuation.resume(throwing: error ?? AlertContext.invalidOCR)
                        return
                    }

                    let text = observations
                        .compactMap { $0.topCandidates(1).first?.string }
                        .joined(separator: "\n")

                    guard !text.isEmpty else {
                        continuation.resume(throwing: AlertContext.noRecognizedText)
                        return
                    }

                    continuation.resume(returning: text)
                }
                request.recognitionLevel = .accurate

                let requestHandler = VNImageRequestHandler(cgImage: cgImage)

                // to confirm that this is running on the correct queue, you could add a precondition:
                //
                // dispatchPrecondition(condition: .onQueue(queue))

                try requestHandler.perform([request])
            } catch {
                continuation.resume(throwing: error)
            }
        }
    }
}

enum AlertContext: LocalizedError {
    case invalidOCR
    case noRecognizedText
    case imageNotFound

    var errorDescription: String? {
        return switch self {
            case .invalidOCR:       String(localized: "Problem recognizing")
            case .noRecognizedText: String(localized: "No recognized text")
            case .imageNotFound:    String(localized: "Image not found")
        }
    }
}

extension Duration {
    var seconds: Double {
        let (seconds, attoseconds) = components
        return Double(seconds) + Double(attoseconds) / 1e18
    }
}

Note, I retired all of the DispatchQueue.main.async {…} code by using MainActor isolation in the above. Also, in practice, I would not show the elapsed time to the nearest hundredth of a second (lol), but I added this in my example to visually illustrate that the main thread was responsive while the text recognition was underway.

But setting this aside, I profiled this in Instruments with the “Time Profiler” template and configured the “Hangs” tool to report “Include All Potential Interaction Delays”:

And we got a clean bill of health from the “Hangs” timeline when I profiled the app:

Bottom line, testing a “release” build on a physical device (admittedly, a top-of-the-line iPhone) with .utility QoS, and I see no interruptions in the UI. Now, it is possible that less capable devices might see momentary hitches, but this is the best one can do.