swiftasync-awaitsfspeechrecognizer

Proper way to support async-await for `func recognitionTask(with request: ..., resultHandler: @escaping (...) -> SFSpeechRecognitionTask`


I'm trying to support async-await for some existing methods with closure. But I'm not sure how to handle the task returned by the method;

open func recognitionTask(with request: SFSpeechRecognitionRequest, resultHandler: @escaping (SFSpeechRecognitionResult?, Error?) -> Void) -> SFSpeechRecognitionTask

So far, I wrote a method like below:

    func recognitionTask(with request: SFSpeechRecognitionRequest) async throws -> SFSpeechRecognitionResult? {
        var task: SFSpeechRecognitionTask?
        let cancelTask = { task?.cancel() }
        
        return try await withTaskCancellationHandler(
            operation: {
                try await withCheckedThrowingContinuation { continuation in
                    task = recognitionTask(with: request) { result, error in
                        if let error = error {
                            continuation.resume(throwing: error)
                        } else {
                            continuation.resume(returning: result)
                        }
                    }
                }
            },
            onCancel: { cancelTask() }
        )
    }

But I couldn't figure out how to access the task object returned by the line task = recognitionTask(with: request) while calling this method. I need the task object so that I can cancel the speech recognition when needed in the app.


Solution

  • You asked:

    But I couldn't figure out how to access the task object returned by the line task = recognitionTask(with: request) while calling this method.

    Right now you have a local reference to this task and you also seem to suggest that you have an ivar, too. You should have just one, or the other, but not both.

    Bottom line, you can use either approach, but pick one or the other.

    You go on to ask:

    I need the task object so that I can cancel the speech recognition when needed in the app.

    You don't have to do that.

    While you might have that ivar for other reasons (see above), you do not have to use it to cancel the speech recognition, yourself. That’s the whole point of withTaskCancellationHandler, namely that it allows you to cancel the Swift concurrency task, and that will take care of canceling the SFSpeechRecognitionTask for you. You really do not have to cancel the SFSpeechRecognitionTask directly, yourself.


    If you are doing real-time speech recognition, withCheckedThrowingContinuation introduces a problem. Specifically, that is designed for one-time events. As the withCheckedThrowingContinuation docs will tell you, that second parameter is a continuation and that …

    … You must resume the continuation exactly once.

    But recognitionTask(with:resultHandler:) will call its handler multiple times. As the docs say:

    The block to call when partial or final results are available, or when an error occurs. If the shouldReportPartialResults property is true, this block may be called multiple times to deliver the partial and final results.

    Now if you have overridden the default value of shouldReportPartialResults, and changed it to false, then withCheckedThrowingContinuation will work. But if you want to show progress of the voice recognition (which is critical feedback to the user during real-time recognition), we would prefer to make this an AsyncSequence instead.

    For example, if you have an actor wrapping your speech recognizer, you might have a method that issues a sequence of strings as the recognition proceeds:

    extension SpeechRecognizer {
        func strings() throws -> AsyncThrowingStream<String, Error> {
            let request = try startRecording()
    
            return AsyncThrowingStream<String, Error> { continuation in
                var finished = false
    
                speechRecognitionTask = recognizer.recognitionTask(with: request) { result, error in
                    if let result {
                        continuation.yield(result.bestTranscription.formattedString)
                    }
    
                    let isFinal = result?.isFinal ?? false
    
                    if !finished, (error != nil || isFinal) {
                        self.cleanup()
                        continuation.finish(throwing: error)
                        finished = true
                    }
                }
    
                continuation.onTermination = { _ in
                    Task {
                        await self.onTermination()
                    }
                }
            }
        }
    }
    

    Then you can use that sequence like so:

    private var task: Task<Void, Error>?
    
    func record() {
        recognizedText = ""
    
        task = Task {
            do {
                let sequence = try await recognizer.strings()
    
                for try await text in sequence {
                    recognizedText = text
                }
            } catch {
                …
            }
        }
    }
    
    func stop() {
        task?.cancel()
    }
    

    This way, as you are doing realtime speech recognition, the user can see the progress.

    enter image description here

    But the observation is that when I want to stop the speech recognition (whether a sequence with the progress as it goes along or just the final result, without any partial results), we cancel the Swift concurrency task, not the SFSpeechRecognitionTask. The Swift concurrency cancelation system will take care of that (with either the onTermination handler for a sequence, or with the withTaskCancellationHandler when dealing with a one-time task).