google-cloud-platformspeech-recognitionspeech-to-textgoogle-speech-apigoogle-cloud-speech

Google Speech to Text Error: "Invalid recognition 'config': bad encoding.." for an MP3 file


I'm recording audio in a react web app using the "mic-recorder-to-mp3" node package.

I've used MediaInfo to look at the audio files produced using this library (here's a sample file) and it shows the following information: Audio File Info

So it doesn't appear to be corrupted or anything...however, when I run Google's Speech to Text API with the following code, I get the error: "Invalid recognition 'config': bad encoding.."

const client = new speech.SpeechClient();

//configure the request:
const config = {
    enableWordTimeOffsets: true,
    sampleRateHertz: 48000,
    encoding: 'MP3',
    languageCode: 'en-US',
};
const audio = {
    content: fs.readFileSync(filename).toString('base64'),
};
const request = {
    config: config,
    audio: audio,
};

// Detects speech in the audio file
const [response] = await client.recognize(request);

I can't understand what's going wrong here...any help would be appreciated!


Solution

  • I was able to reproduce the issue, seems that the encoding used is the root cause, I used the gcloud ml speech recognize command and I got no responses:

    gcloud ml speech recognize gs://MY_BUCKET/audioClip.mp3 --language-code=en-US --encoding=linear16 --sample-rate=48000
    
    {}
    

    After that, I changed the encoding of the file:

    ffmpeg -i audioClip.mp3 audioClip.wav
    

    Then I tried again and voilĂ :

    gcloud ml speech recognize gs://MY_BUCKET/audioClip.wav --language-code=en-US --encoding=linear16 --sample-rate=48000
    
    {
      "results": [
        {
          "alternatives": [
            {
              "confidence": 0.7809482,
              "transcript": "testing testing 1 2 3"
            }
          ]
        }
      ]
    }
    

    Please consider that according to this documentation MP3 encoding is a Beta feature and only available in v1p1beta1. So, you should consider to convert your file before to send it to the Speech to Text API.