pythonspeech-to-textazure-speechfile-ownership

Microsoft Speech SDK is using the audio file even after transcription, how to know when it will stop using the file? or how to force it to stop?


I am using Microsoft speech SDK to transcript an audio wave files, I receive the files as binary through an API. I tried to benefit from this format directly but I couldn't make it the input to the SDK functions. Because of this I made a function that saves this audio as a wave file first then give the path to Microsoft functions. After this process is done I don't need the file, I used os.remove() to remove it but every time it gives me the error that another process is using the file. I debugged and found out that one of Microsoft functions is the one using the file.

That is my code:

def function_that_gets_binary(file):
    with open(os.path.join("Data_to_remove", file.filename), "wb") as f:
        f.write(file.file.read())
        filename = file.filename
    f.close()
    transcript, time_seconds = MS_SDK(os.path.join("Data_to_remove",filename))


def MS_SDK(voice):
    audio_config = speechsdk.audio.AudioConfig(filename=voice)
    speech_config.speech_recognition_language = "ar-SA"
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    done = False

    def stop_cb(evt):
        """callback that stops continuous recognition upon receiving an event `evt`"""
        nonlocal done
        done = True
        speech_recognizer.stop_continuous_recognition()
        print("Stopped")
        os.remove(voice)

    full_text = []
    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognized.connect(lambda evt: full_text.append(format(evt.result.text)))
    speech_recognizer.session_stopped.connect(stop_cb)

    # Start continuous speech recognition
    start = timeit.default_timer()
    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(0.5)
    transcript = " ".join(full_text)
    end = timeit.default_timer()
    duration = "{} seconds".format(round((end - start), 6))



    return transcript , duration

I tried to add a sleep time to give the process some time to leave the file, but I got the same error. I tried to make a separate remove function that runs in the background with 20 seconds sleep and the same error! I don't understand what is happening because I already received the transcript and the process printed "stopped"


Solution

  • I converted speech to text and successfully deleted the audio .wav file after conversion with the code below.

    Code :

    import time
    import timeit
    import os
    import azure.cognitiveservices.speech as speechsdk
    
    def MS_SDK(voice):
        audio_config = speechsdk.audio.AudioConfig(filename=voice)
        speech_config = speechsdk.SpeechConfig(subscription="<speech_key>", region="<speech_region>")
        speech_config.speech_recognition_language = "ar-SA"
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
        done = False
    
        def stop_cb(evt):
            """callback stops continuous recognition upon receiving an event `evt`"""
            nonlocal done
            done = True
            speech_recognizer.stop_continuous_recognition()
            print("Stopped")
    
        full_text = []
        speech_recognizer.recognized.connect(lambda evt: full_text.append(format(evt.result.text)))
        speech_recognizer.session_stopped.connect(stop_cb)
    
        start = timeit.default_timer()
        speech_recognizer.start_continuous_recognition()
        while not done:
            time.sleep(0.5)
        transcript = " ".join(full_text)
        end = timeit.default_timer()
        duration = "{} seconds".format(round((end - start), 6))
        speech_recognizer.__del__()
        return transcript, duration
    
    def transcribe_audio_and_print(audio_file_path):
        transcript, duration = MS_SDK(audio_file_path)
        print("Transcript:", transcript)
        print("Duration:", duration)
    
        try:
            os.remove(audio_file_path)
            print("File deleted successfully.")
        except Exception as e:
            print("Error occurred while deleting file:", e)
    
    audio_file_path = "path to wav file/Data_to_remove/<filename>.wav"
    transcribe_audio_and_print(audio_file_path)
    

    Output :

    The following code converted speech to text, and the audio .wav file was successfully deleted as shown below.

    enter image description here

    C:\Users\xxxxxxxx\Documents\xxxxxxxxx>python main.py
    Stopped
    Transcript: Hello this is a test of the speech synthesis service.
    Duration: 3.507588 seconds
    File deleted successfully.