swiftavspeechsynthesizerios16

Recording speech synthesis to a saved file


Below is the code I've put together to attempt to take a phrase, save it to a file, then play that saved file. Not sure what area isn't working (not correct file name, not saving the file, not finding the file). Any help would be appreciated. (The speakPhrase is just a helper function to let me know that the speech synthesizer actually works, which it does).

import AVFoundation
import Foundation

class Coordinator {

    let synthesizer: AVSpeechSynthesizer
    var player: AVAudioPlayer?

    init() {
        let synthesizer = AVSpeechSynthesizer()
        self.synthesizer = synthesizer
    }
    
    var recordingPath:  URL {
        let soundName = "Finally.caf"
        // I've tried numerous file extensions.  .caf was in an answer somewhere else.  I would think it would be
        // .pcm, but that doesn't work either.
        
        // Local Directory
        let paths = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)
        return paths[0].appendingPathComponent(soundName)
    }

    func speakPhrase(phrase: String) {
        let utterance = AVSpeechUtterance(string: phrase)
        utterance.voice = AVSpeechSynthesisVoice(language: "en")
        synthesizer.speak(utterance)
    }
    
    func playFile() {
        print("Trying to play the file")
        
        do {
            try AVAudioSession.sharedInstance().setCategory(.playback, mode: .default)
            try AVAudioSession.sharedInstance().setActive(true)
            
            player = try AVAudioPlayer(contentsOf: recordingPath, fileTypeHint: AVFileType.caf.rawValue)
            guard let player = player else {return}
                
                player.play()
        } catch {
            print("Error playing file.")
        }
    }
    
    func saveAVSpeechUtteranceToFile() {
        
        let utterance = AVSpeechUtterance(string: "This is speech to record")
        utterance.voice = AVSpeechSynthesisVoice(language: "en-US")
        utterance.rate = 0.50
        
        synthesizer.write(utterance) { [self] (buffer: AVAudioBuffer) in
            guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
                fatalError("unknown buffer type: \(buffer)")
            }
            if pcmBuffer.frameLength == 0 {
                // Done
            } else {
                // append buffer to file
                do {
                    let audioFile = try AVAudioFile(forWriting: recordingPath, settings: pcmBuffer.format.settings, commonFormat: .pcmFormatInt16, interleaved: false)
                    try audioFile.write(from: pcmBuffer)
                } catch {
                    print(error.localizedDescription)
                }
            }
        }
    }
}

Solution

  • Did you noticed the bufferCallback in the below function is called multiple times?

    func write(_ utterance: AVSpeechUtterance,toBufferCallback bufferCallback: @escaping AVSpeechSynthesizer.BufferCallback)
    

    So the root cause is pretty simple: the AVSpeechUtterance's audio is divided into multiple parts. On my iPhone, the callback calls about 20 times. So if you create a new audio file in the closure every time, you will get a very tiny audio file(on my iPhone it was a 6kb audio file). That audio is not noticeable if you play it.

    The tiny file you created

    So replace the function to

    func saveAVSpeechUtteranceToFile() {
        
        let utterance = AVSpeechUtterance(string: "This is speech to record")
        utterance.voice = AVSpeechSynthesisVoice(language: "en-US")
        utterance.rate = 0.50
        
        // Only create new file handle if `output` is nil.
        var output: AVAudioFile?
        
        synthesizer.write(utterance) { [self] (buffer: AVAudioBuffer) in
            guard let pcmBuffer = buffer as? AVAudioPCMBuffer else {
                fatalError("unknown buffer type: \(buffer)")
            }
            if pcmBuffer.frameLength == 0 {
                // Done
            } else {
                
                do{
                    // this closure is called multiple times. so to save a complete audio, try create a file only for once.
                    if output == nil {
                        try  output = AVAudioFile(
                            forWriting: recordingPath,
                            settings: pcmBuffer.format.settings,
                            commonFormat: pcmBuffer.format.commonFormat,
                            interleaved: false)
                    }
                    try output?.write(from: pcmBuffer)
                }catch {
                    print(error.localizedDescription)
                }
                
            }
            
        }
    }
    

    BTW, I uploaded Github Demo here.

    Finally, tell you how to inspect the file contents on an iOS device.

    Xcode Window Menu -> Device and Simulators, do like below to copy out your app's content.

    enter image description here