I'm trying to use Vosk speech recognition in a Python script, but the result is always :
{
"text" : ""
}
It's not a problem with my file because when I use in DOS "vosk-transcriber -l fr -i speech3.wav -o test6.txt" it works perfectly and I got a test6.txt with an accurate transcription.
Here is my Python :
import vosk
# Load the Vosk model
model = vosk.Model("voskSmallFr")
# Initialize the recognizer with the model
recognizer = vosk.KaldiRecognizer(model, 16000)
# Sample audio file for recognition
audio_file = "speech3.wav"
# Open the audio file
with open(audio_file, "rb") as audio:
while True:
# Read a chunk of the audio file
data = audio.read(4000)
if len(data) == 0:
break
# Recognize the speech in the chunk
recognizer.AcceptWaveform(data)
# Get the final recognized result
result = recognizer.FinalResult()
print(result)
I downloaded and tried every models available in French (my wav file is in French) on the official Vosk website (4 in total), the scripts run well but give no results contrary to the Windows command...
Any ideas? Thank you
I'm answering my own question in order to post the final solution to my problem, but it's mainly thanks to Lewis answers and comments below.
Thank you Lewis !
the input .wav file must be PCM 16 bit mono, wich can be obtain with "ffmpeg -i "speech3.wav" "outfile.wav"
in windows cmd after installing ffmpeg.
import wave
import json
from vosk import Model, KaldiRecognizer, SetLogLevel
#.wav file must be PCM 16-bit mono !
def vosk(wavFile):
SetLogLevel(0)
wf = wave.open(wavFile, "rb")
model = Model(model_path="voskSmallFr", model_name="vosk-model-small-fr-0.22")
rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)
rec.SetPartialWords(True)
text = []
while True:
data = wf.readframes(4000)
if len(data) == 0:
break
# if silence detected save result
if rec.AcceptWaveform(data):
text.append(json.loads(rec.Result())["text"])
text.append(json.loads(rec.FinalResult())["text"])
text=str(text)[2:-2]
return text
print(vosk("outfile.wav"))