I have some .opus
audio files that need to be converted to text in order to run some analytics. I am aware that there is the Python SpeechRecognition package that can do this with .wav
files as demonstrated in this tutorial.
Does anyone know how to convert .opus
files to text, or convert .opus
to .wav
?
I have tried the Python SpeechRecognition package with no success.
Here is a solution which employs ffmpeg and the os
library to first convert all .opus
files in the specified directory to .wav
, and then perform speech recognition on the resulting .wav
files using the speech_recognition
module:
import os
import speech_recognition as sr
path = './audio-files/'
file_type_to_convert = ".opus"
file_type_to_recognize = ".wav"
for filename in os.listdir(path):
if filename.endswith(file_type_to_convert):
os.system("ffmpeg -i \"{}\" -vn \"{}\"".format(path + filename,
path + filename[:-len(file_type_to_convert)] +
file_type_to_recognize))
recognizer = sr.Recognizer() # Instantiate recognizer
rec_output = {} # Create list to store output of speech recognized files
# Iterate over each file of specified type to be recognized
for file_to_recognize in os.listdir(path):
if file_to_recognize.endswith(file_type_to_recognize):
audio = sr.AudioFile(path + file_to_recognize)
with audio as source:
audio_data = recognizer.record(audio)
# Recognize & append output
# Note: google recognizer is online only, sphinx is the only offline option which uses CMU Sphinx engine
rec_output[file_to_recognize[:-len(file_type_to_recognize)]] = recognizer.recognize_google(audio_data,
language='en-US')
# Display each file's output
for key, val in rec_output.items():
print(key)
print(val)
# Output:
# File name
# Recognized words in each file