iosspeech-recognitionavspeechsynthesizersfspeechrecognizer

iOS: AVSpeechSynthesizer doesn't work after recording with SFSpeechRecognizer


I am making an application that does Text-to-speech and speech-to-text.

The problem i am having right now is that text-to-speech works fine using AVSpeechSynthesizer. But after i record and do speech-to-text using SFSpeechRecognizer, the text-to-speech stops working (ie, doesn't talk back).

I am new to swift too. But i got this code from a couple of different tutorials and tried to merge them together.

Here's my code:

 private var speechRecognizer = SFSpeechRecognizer(locale: Locale.init(identifier: "en-US"))!
 private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
 private var recognitionTask: SFSpeechRecognitionTask?
 private var audioEngine = AVAudioEngine()

    @objc(speak:location:date:callback:)
    func speak(name: String, location: String, date: NSNumber,_ callback: @escaping (NSObject) -> ()) -> Void {
      let utterance = AVSpeechUtterance(string: name)
      let synthesizer = AVSpeechSynthesizer()
      synthesizer.speak(utterance)
    }


    @available(iOS 10.0, *)
    @objc(startListening:location:date:callback:)
    func startListening(name: String, location: String, date: NSNumber,_ callback: @escaping (NSObject) -> ()) -> Void {
        if audioEngine.isRunning {
            audioEngine.stop()
            recognitionRequest?.endAudio()


        } else {

            if recognitionTask != nil {  //1
                recognitionTask?.cancel()
                recognitionTask = nil
            }

            let audioSession = AVAudioSession.sharedInstance()  //2
            do {
                try audioSession.setCategory(AVAudioSessionCategoryPlayAndRecord)
                try audioSession.setMode(AVAudioSessionModeMeasurement)
                try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
            } catch {
                print("audioSession properties weren't set because of an error.")
            }

            recognitionRequest = SFSpeechAudioBufferRecognitionRequest()  //3

            guard let inputNode = audioEngine.inputNode else {
                fatalError("Audio engine has no input node")
            }  //4

            guard let recognitionRequest = recognitionRequest else {
                fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
            } //5

            recognitionRequest.shouldReportPartialResults = true  //6

            recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in  //7

                var isFinal = false  //8

                if result != nil {

                    print(result?.bestTranscription.formattedString)  //9
                    isFinal = (result?.isFinal)!
                }

                if error != nil || isFinal {  //10
                    self.audioEngine.stop()
                    inputNode.removeTap(onBus: 0)

                    self.recognitionRequest = nil
                    self.recognitionTask = nil


                }
            })

            let recordingFormat = inputNode.outputFormat(forBus: 0)  //11
            inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
                self.recognitionRequest?.append(buffer)
            }

            audioEngine.prepare()  //12

            do {
                try audioEngine.start()
            } catch {
                print("audioEngine couldn't start because of an error.")
            }




        }

    }

Solution

  • They both have an AVAudioSession.

    For AVSpeechSynthesizer I suppose it has to be set to:

    _audioSession.SetCategory(AVAudioSessionCategory.Playback, 
    AVAudioSessionCategoryOptions.MixWithOthers);
    

    and For SFSpeechRecognizer:

    _audioSession.SetCategory(AVAudioSessionCategory.PlayAndRecord, 
    AVAudioSessionCategoryOptions.MixWithOthers);
    

    Hope it helps.