google-cloud-platformspeech-to-textgoogle-cloud-speech

Google Cloud Platform speech to text - customize transcribed text


I'm using the Google Cloud Platform (GCP) model adaptation feature of speech to text to enable recognition of utterances that are unique to an industry, e.g. When a user utters JSON, it should be transcribed as JSON instead of 'Jason'. I achieve this by using a phrase set and an associated boost value.

The text in this example is transcribed as Json. I would like this to be transcribed as JSON (all caps)

I have thoroughly read the GCP documentation, but I haven't found a document which relates my problem. I've also tried Azure, where there's an option to upload a pronunciation file. I'm looking for a similar solution in GCP.


Solution

  • I have tried it myself and got the same results. Even with maxAlternatives set to 20.

    There is currently no option like a pronunciation file so I have created a Feature Request to ask for its implementation.
    Remember to star it in order to get an email notification on every update. And, if you can, add your business case and/or impact to give the full picture.

    For now, the workaround would be to implement a "catcher" on your code. In Python, you could use replace() or upper().
    Something in the line of:

    for result in response.results:
        print("Transcript: {}".format(result.alternatives[0].transcript.replace('Json', 'JSON')))
    

    And if you need to catch more words, loop over a list with an if condition:

    result='I need a Json file'
    lower_words = ['Json', 'csv']
    upper_words = ['JSON', 'CSV']
    for result in response.results:
        for lower_word, upper_word in zip(lower_words, upper_words):
            if lower_word in result:
                print("Transcript: {}".format(result.alternatives[0].transcript.replace(lower_word, upper_word)))
    

    Of course, this will print at each iteration that meets the condition so in case you could have more then one of these words in a result, you may want to store intermediate result and print after the nested loop.

    I hope you won't have too many words to change though or this will slow down your application a lot.