I'm using SFSpeechRecognizer in my app which is working fine to ease the end user entering a comment in a UITextView thanks to a dedicated button (Start Speech Recognition).
But if the user is typing some text manually first and then starts its Speech Recognition, the previous text entered manually is erased. This is also the case if the user is performing two times a Speech Recognition (user is "speech" recording a first part of its text, then stop recording, and finally restart recording) on the same UITextView, the previous text is erased.
Hence, I would like to know how I can append text recognized by SFSpeechRecognizer to the existing one.
Here is my code:
func recordAndRecognizeSpeech(){
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
let recognitionRequest = self.recognitionRequest
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
self.decaration.text = (result?.bestTranscription.formattedString)!
isFinal = (result?.isFinal)!
let bottom = NSMakeRange(self.decaration.text.characters.count - 1, 1)
self.decaration.scrollRangeToVisible(bottom)
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionTask = nil
self.recognitionRequest.endAudio()
self.oBtSpeech.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
}
I tried to update
self.decaration.text = (result?.bestTranscription.formattedString)!
by
self.decaration.text += (result?.bestTranscription.formattedString)!
but it makes a doubloon for each sentence recognized.
Any idea how I can do that ?
Try saving the text before starting the recognition system.
func recordAndRecognizeSpeech(){
// one change here
let defaultText = self.decaration.text
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
let recognitionRequest = self.recognitionRequest
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
// one change here
self.decaration.text = defaultText + " " + (result?.bestTranscription.formattedString)!
isFinal = (result?.isFinal)!
let bottom = NSMakeRange(self.decaration.text.characters.count - 1, 1)
self.decaration.scrollRangeToVisible(bottom)
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionTask = nil
self.recognitionRequest.endAudio()
self.oBtSpeech.isEnabled = true
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
}
result?.bestTranscription.formattedString
returns the entire phrase that was recognised, thats why you should reset self.decaration.text
each time you get a response from SFSpeechRecognnizer
.