pythonnlpspeech-to-textspeechopus

How to convert speech to text in python - opus file format


I have some .opus audio files that need to be converted to text in order to run some analytics. I am aware that there is the Python SpeechRecognition package that can do this with .wav files as demonstrated in this tutorial.

Does anyone know how to convert .opus files to text, or convert .opus to .wav?

I have tried the Python SpeechRecognition package with no success.


Solution

  • Here is a solution which employs ffmpeg and the os library to first convert all .opus files in the specified directory to .wav, and then perform speech recognition on the resulting .wav files using the speech_recognition module:

    Solution

    import os
    import speech_recognition as sr
    
    path = './audio-files/'
    file_type_to_convert = ".opus"
    file_type_to_recognize = ".wav"
    
    for filename in os.listdir(path):
        if filename.endswith(file_type_to_convert):
            os.system("ffmpeg -i \"{}\" -vn \"{}\"".format(path + filename,
                                                           path + filename[:-len(file_type_to_convert)] +
                                                           file_type_to_recognize))
    recognizer = sr.Recognizer()  # Instantiate recognizer
    rec_output = {}  # Create list to store output of speech recognized files
    
    # Iterate over each file of specified type to be recognized
    for file_to_recognize in os.listdir(path):
        if file_to_recognize.endswith(file_type_to_recognize):
            audio = sr.AudioFile(path + file_to_recognize)
            with audio as source:
                audio_data = recognizer.record(audio)
            # Recognize & append output
            # Note: google recognizer is online only, sphinx is the only offline option which uses CMU Sphinx engine
            rec_output[file_to_recognize[:-len(file_type_to_recognize)]] = recognizer.recognize_google(audio_data,
                                                                                                       language='en-US')
    
    # Display each file's output
    for key, val in rec_output.items():
        print(key)
        print(val)
        # Output: 
        # File name
        # Recognized words in each file