python-3.xgoogle-speech-to-text-apivosk

Vosk-api python for speech-recognition. Feature for google-like speech adaption?


So Vosk-api is a brilliant offline speech recogniser with brilliant support, however with very poor (or smartly hidden) documentation, at the moment of this post (14 Aug, 2020)

The question is: is there any kind of replacement of google-speech-recognizer feature, which allows additional transcription improvement by speech adaptation?

E.g.

"config": {
    "encoding":"LINEAR16",
    "sampleRateHertz": 8000,
    "languageCode":"en-US",
    "speechContexts": [{
      "phrases": ["weather"]
    }]
}

For Google this config means that phrase weather will have more priority, with respect to, say, whether which sounds the same.

Or class tokens? I understand that it may not be implemented in Vosk for python3, but still...

Here are references:

https://cloud.google.com/speech-to-text/docs/class-tokens


https://cloud.google.com/speech-to-text/docs/speech-adaptation


Solution

  • You can follow this document for information on Vosk model adaptation:

    https://alphacephei.com/vosk/adaptation

    Basically there are 4 levels:

    1. Update small model with list of words to recognize
    2. Update small model offline with the language model from texts
    3. Update language model and the dictionary inside the big model
    4. Finetune acoustic model on your data

    The process is not fully automated, but you can ask in the group for help.