pythongoogle-cloud-speech

Speech-to-Text Phrase Exceeds Character Limit


I am using the Google Speech-to-Text client library for Python to convert speech using speech adaptation. I want to be able to boost phrases that fit a certain pattern. I have used this documentation to create custom classes and phrase sets and put them together into a SpeechAdaptation object.

movement_words = ["move", "go", "turn", "rotate"]
class_items = list(map(lambda word: CustomClass.ClassItem(value=word), movement_words))
movement_custom_class = CustomClass(name="Movement Words", custom_class_id="movement_words", items=class_items)

direction_words = ["forward","forwards","backward","backwards","back","left","right","clockwise","counterclockwise","to the left","to the right"]
class_items = list(map(lambda word: CustomClass.ClassItem(value=word), direction_words))
direction_custom_class = CustomClass(name="Direction Words", custom_class_id="directions",items=class_items)

unit_words = ["meter","meters","feet","foot","degrees","radians"]
class_items = list(map(lambda word: CustomClass.ClassItem(value=word), unit_words))
unit_custom_class = CustomClass(name="Unit Words", custom_class_id="units",items=class_items)

number_first_phrase = PhraseSet(name="number_first_phrase", phrases=[PhraseSet.Phrase(value="${movement_words} $OPERAND ${units} ${directions}")], boost=10)

speech_adaptation_object = SpeechAdaptation(
  phrase_sets = [number_first_phrase],
  phrase_set_references = [],
  custom_classes = [movement_custom_class, direction_custom_class, unit_custom_class]
)

I then use this in the below RecognitionConfig as follows:

config = types.RecognitionConfig(
        encoding=types.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=RATE,
        language_code=language_code,
        enable_automatic_punctuation=True,
        adaptation=speech_adaptation_object
     )

    streaming_config = types.StreamingRecognitionConfig(
        config=config,
        interim_results=True)

However, I get the following error message: 400 Invalid recognition 'config': Context phrase with 152 characters found, but max is 100. I tried breaking down the phrase in my PhraseSet in half, which stopped the error message. However, it made me question why the context phrase was detected as having 152 characters when "${movement_words} $OPERAND ${units} ${directions}" doesn't even have 100 characters. I'd really appreciate any guidance into understanding how the character limit works here. Thank you!


Solution

  • Your phrase "${movement_words} $OPERAND ${units} ${directions}" has expanding variables (anything inside the {} refers to a variable)

    So all the words in your array get expanded out - Now the phrase is easily more than 100 characters

    movement_words = ["move", "go", "turn", "rotate"]
    
    direction_words = ["forward","forwards","backward","backwards","back","left","right","clockwise","counterclockwise","to the left","to the right"]
    
    unit_words = ["meter","meters","feet","foot","degrees","radians"]