speech-recognitionspeech-to-textgoogle-speech-api

INVALID_ARGUMENT: Request payload size exceeds the limit: 10485760 bytes


I'm using for the first time the GCS Speech API for a project to convert a series of audio files to text. Each file has around 60 minutes and is a person talking continuously during the whole time. I've installed the GC SDK and I'm using it to perform the requests as shown bellow:

gcloud ml speech recognize-long-running \
"/path/to/file/audio.flac" \
--language-code="pt-PT" --async

Every time I run this on one of my recording, it gives the following error message:

ERROR: (gcloud.ml.speech.recognize-long-running) INVALID_ARGUMENT: 
Request payload size exceeds the limit: 10485760 bytes.

It seems to be a very hard restriction because if the API is able to process files up to 180 minutes, there's no way it'll output a maximum of 10,000 characters worth of speech.
I've tried to split the audio files into smaller pieces and reached up to four 15 minute samples and even so I've got the same error. Besides, even if it worked, it would be a very tedious and impractical task to split every new recording I make from here forward.

I've been searching and so far I haven't reached any conclusion about how to increase or circumvent this limitation. I'm on a free trial account but I'm happy to upgrade to a paid subscription to have this limit increased. As far as I understood, this limitation will persist even if I'm on a paid subscription.

Has anyone found any solution for this problem?


Solution

  • After talking with the Google Cloud support I came to the conclusion that this was due to a limitation from my free trial subscription and the size of my file (~60min).

    After upgrading to the paid subscription and uploading my file to the Google Cloud Storage I was able to receive the payload from the transcription.

    $ gcloud ml speech recognize-long-running "gs://test-bucket/my_audio_file.flac" --language-code="pt-PT" --async
    Check operation [7456984365978465938] for status.
    {
      "name": "7456984365978465938"
    }
    
    $ gcloud ml speech operations describe 7456984365978465938
    {
      ... payload ...
    }