swiftsfspeechrecognizer

Swift SFSpeechRecognizer appending existing UITextView content


I'm using SFSpeechRecognizer in my app which is working fine to ease the end user entering a comment in a UITextView thanks to a dedicated button (Start Speech Recognition).

But if the user is typing some text manually first and then starts its Speech Recognition, the previous text entered manually is erased. This is also the case if the user is performing two times a Speech Recognition (user is "speech" recording a first part of its text, then stop recording, and finally restart recording) on the same UITextView, the previous text is erased.

Hence, I would like to know how I can append text recognized by SFSpeechRecognizer to the existing one.

Here is my code:

func recordAndRecognizeSpeech(){

    if recognitionTask != nil {
        recognitionTask?.cancel()
        recognitionTask = nil
    }
    let audioSession = AVAudioSession.sharedInstance()
    do {
        try audioSession.setCategory(AVAudioSessionCategoryRecord)
        try audioSession.setMode(AVAudioSessionModeMeasurement)
        try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
    } catch {
        print("audioSession properties weren't set because of an error.")
    }
    self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
    guard let inputNode = audioEngine.inputNode else {
        fatalError("Audio engine has no input node")
    }
    let recognitionRequest = self.recognitionRequest
    recognitionRequest.shouldReportPartialResults = true

    recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
        var isFinal = false
        self.decaration.text = (result?.bestTranscription.formattedString)!

        isFinal = (result?.isFinal)!
        let bottom = NSMakeRange(self.decaration.text.characters.count - 1, 1)
        self.decaration.scrollRangeToVisible(bottom)

        if error != nil || isFinal {
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)
            self.recognitionTask = nil
            self.recognitionRequest.endAudio()
            self.oBtSpeech.isEnabled = true
        }
    })
    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
        self.recognitionRequest.append(buffer)
    }
    audioEngine.prepare()

    do {
        try audioEngine.start()
    } catch {
        print("audioEngine couldn't start because of an error.")
    }

}

I tried to update

self.decaration.text = (result?.bestTranscription.formattedString)!

by

self.decaration.text += (result?.bestTranscription.formattedString)!

but it makes a doubloon for each sentence recognized.

Any idea how I can do that ?


Solution

  • Try saving the text before starting the recognition system.

    func recordAndRecognizeSpeech(){
        // one change here
        let defaultText = self.decaration.text
    
        if recognitionTask != nil {
            recognitionTask?.cancel()
            recognitionTask = nil
        }
        let audioSession = AVAudioSession.sharedInstance()
        do {
            try audioSession.setCategory(AVAudioSessionCategoryRecord)
            try audioSession.setMode(AVAudioSessionModeMeasurement)
            try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
        } catch {
            print("audioSession properties weren't set because of an error.")
        }
        self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
        guard let inputNode = audioEngine.inputNode else {
            fatalError("Audio engine has no input node")
        }
        let recognitionRequest = self.recognitionRequest
        recognitionRequest.shouldReportPartialResults = true
    
        recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
            var isFinal = false
            // one change here
            self.decaration.text = defaultText + " " + (result?.bestTranscription.formattedString)!
    
            isFinal = (result?.isFinal)!
            let bottom = NSMakeRange(self.decaration.text.characters.count - 1, 1)
            self.decaration.scrollRangeToVisible(bottom)
    
            if error != nil || isFinal {
                self.audioEngine.stop()
                inputNode.removeTap(onBus: 0)
                self.recognitionTask = nil
                self.recognitionRequest.endAudio()
                self.oBtSpeech.isEnabled = true
            }
        })
        let recordingFormat = inputNode.outputFormat(forBus: 0)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
            self.recognitionRequest.append(buffer)
        }
        audioEngine.prepare()
    
        do {
            try audioEngine.start()
        } catch {
            print("audioEngine couldn't start because of an error.")
        }
    }
    

    result?.bestTranscription.formattedString returns the entire phrase that was recognised, thats why you should reset self.decaration.text each time you get a response from SFSpeechRecognnizer.