So Vosk-api is a brilliant offline speech recogniser with brilliant support, however with very poor (or smartly hidden) documentation, at the moment of this post (14 Aug, 2020)
The question is: is there any kind of replacement of google-speech-recognizer feature, which allows additional transcription improvement by speech adaptation?
E.g.
"config": {
"encoding":"LINEAR16",
"sampleRateHertz": 8000,
"languageCode":"en-US",
"speechContexts": [{
"phrases": ["weather"]
}]
}
For Google this config means that phrase weather will have more priority, with respect to, say, whether which sounds the same.
Or class tokens? I understand that it may not be implemented in Vosk for python3, but still...
Here are references:
https://cloud.google.com/speech-to-text/docs/class-tokens
https://cloud.google.com/speech-to-text/docs/speech-adaptation
You can follow this document for information on Vosk model adaptation:
https://alphacephei.com/vosk/adaptation
Basically there are 4 levels:
The process is not fully automated, but you can ask in the group for help.