I'm using the Google Cloud Platform (GCP) model adaptation feature of speech to text to enable recognition of utterances that are unique to an industry, e.g. When a user utters JSON, it should be transcribed as JSON instead of 'Jason'. I achieve this by using a phrase set and an associated boost value.
The text in this example is transcribed as Json. I would like this to be transcribed as JSON (all caps)
I have thoroughly read the GCP documentation, but I haven't found a document which relates my problem. I've also tried Azure, where there's an option to upload a pronunciation file. I'm looking for a similar solution in GCP.
I have tried it myself and got the same results. Even with maxAlternatives
set to 20.
There is currently no option like a pronunciation file so I have created a Feature Request to ask for its implementation.
Remember to star it in order to get an email notification on every update. And, if you can, add your business case and/or impact to give the full picture.
For now, the workaround would be to implement a "catcher" on your code.
In Python, you could use replace()
or upper()
.
Something in the line of:
for result in response.results:
print("Transcript: {}".format(result.alternatives[0].transcript.replace('Json', 'JSON')))
And if you need to catch more words, loop over a list with an if
condition:
result='I need a Json file'
lower_words = ['Json', 'csv']
upper_words = ['JSON', 'CSV']
for result in response.results:
for lower_word, upper_word in zip(lower_words, upper_words):
if lower_word in result:
print("Transcript: {}".format(result.alternatives[0].transcript.replace(lower_word, upper_word)))
Of course, this will print at each iteration that meets the condition so in case you could have more then one of these words in a result, you may want to store intermediate result and print after the nested loop.
I hope you won't have too many words to change though or this will slow down your application a lot.