pythonspeech-recognitionvosk

Use Vosk speech recognition with Python


I'm trying to use Vosk speech recognition in a Python script, but the result is always :

{
  "text" : ""
}

It's not a problem with my file because when I use in DOS "vosk-transcriber -l fr -i speech3.wav -o test6.txt" it works perfectly and I got a test6.txt with an accurate transcription.

Here is my Python :

import vosk

# Load the Vosk model
model = vosk.Model("voskSmallFr")

# Initialize the recognizer with the model
recognizer = vosk.KaldiRecognizer(model, 16000)

# Sample audio file for recognition
audio_file = "speech3.wav"

# Open the audio file
with open(audio_file, "rb") as audio:
    while True:
        # Read a chunk of the audio file
        data = audio.read(4000)
        if len(data) == 0:
            break
        # Recognize the speech in the chunk
        recognizer.AcceptWaveform(data)

# Get the final recognized result
result = recognizer.FinalResult()
print(result)

I downloaded and tried every models available in French (my wav file is in French) on the official Vosk website (4 in total), the scripts run well but give no results contrary to the Windows command...

Any ideas? Thank you


Solution

  • I'm answering my own question in order to post the final solution to my problem, but it's mainly thanks to Lewis answers and comments below. Thank you Lewis ! the input .wav file must be PCM 16 bit mono, wich can be obtain with "ffmpeg -i "speech3.wav" "outfile.wav" in windows cmd after installing ffmpeg.

    import wave
    import json
    from vosk import Model, KaldiRecognizer, SetLogLevel
    
    
    #.wav file must be PCM 16-bit mono !
    
    def vosk(wavFile):
        SetLogLevel(0)
    
        wf = wave.open(wavFile, "rb")
    
        model = Model(model_path="voskSmallFr", model_name="vosk-model-small-fr-0.22")
        rec = KaldiRecognizer(model, wf.getframerate())
        rec.SetWords(True)
        rec.SetPartialWords(True)
                        
        text = []    
        while True:
            data = wf.readframes(4000)
            if len(data) == 0:
                 break
            # if silence detected save result
            if rec.AcceptWaveform(data):
                text.append(json.loads(rec.Result())["text"])
        text.append(json.loads(rec.FinalResult())["text"])
    
        text=str(text)[2:-2]
        return text
    
    
    
    print(vosk("outfile.wav"))