python-3.xspeech-recognitionspeech-to-textazure-cognitive-services

Microsoft Speech to Text Python SDK SPXERR_INVALID_HEADER issue


I'm getting the following error when using the Microsoft Python Speech-to-Text Quickstart ("Quickstart: Recognize speech from an audio file") with the azure-cognitiveservices-speech v1.8.0 SDK.

RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)

There are just 3 inputs to this file:

I'm using the following test MP3 file:

Here's the full output:

Traceback (most recent call last):
  File "main.py", line 16, in <module>
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 761, in __init__
    self._impl = self._get_impl(impl.SpeechRecognizer, speech_config, audio_config)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 547, in _get_impl
    _impl = reco_type._from_config(speech_config._impl, audio_config._impl)
RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
[CALL STACK BEGIN]

3   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad88d2 CreateModuleObject + 1136482
4   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad7f4f CreateModuleObject + 1134047
5   libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1803 CreateModuleObject + 59027
6   libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1503 CreateModuleObject + 58259
7   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a11c64 CreateModuleObject + 322292
8   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a10be5 CreateModuleObject + 318069
9   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e5a2 CreateModuleObject + 308274
10  libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e7c3 CreateModuleObject + 308819
11  libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106960bc7 recognizer_create_speech_recognizer_from_config + 3863
12  libMicrosoft.CognitiveServices.Speech.core.dylib 0x000000010695fd74 recognizer_create_speech_recognizer_from_config + 196
13  _speech_py_impl.so                  0x00000001067ff35b PyInit__speech_py_impl + 814939
14  _speech_py_impl.so                  0x000000010679b530 PyInit__speech_py_impl + 405808
15  Python                              0x00000001060f65dc _PyMethodDef_RawFastCallKeywords + 668
16  Python                              0x00000001060f5a5a _PyCFunction_FastCallKeywords + 42
17  Python                              0x00000001061b45a4 call_function + 724
18  Python                              0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
19  Python                              0x00000001060f5e90 function_code_fastcall + 128
20  Python                              0x00000001061b45b2 call_function + 738
21  Python                              0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
22  Python                              0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
23  Python                              0x00000001060f55fb _PyFunction_FastCallDict + 523
24  Python                              0x00000001060f68cf _PyObject_Call_Prepend + 143
25  Python                              0x0000000106144d51 slot_tp_init + 145
26  Python                              0x00000001061406a9 type_call + 297
27  Python                              0x00000001060f5871 _PyObject_FastCallKeywords + 433
28  Python                              0x00000001061b4474 call_function + 420
29  Python                              0x00000001061b16bd _PyEval_EvalFrameDefault + 25517
30  Python                              0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
31  Python                              0x00000001061ab234 PyEval_EvalCode + 100
32  Python                              0x00000001061e88f1 PyRun_FileExFlags + 209
33  Python                              0x00000001061e816a PyRun_SimpleFileExFlags + 890
34  Python                              0x00000001062079db pymain_main + 6875
35  Python                              0x0000000106207f2a _Py_UnixMain + 58
36  libdyld.dylib                       0x00007fff5d8aaed9 start + 1
37  ???                                 0x0000000000000002 0x0 + 2

Can anyone provide some pointers on what header this is referring to and how to resolve this.


Solution

  • mp3-encoded audio is not supported as an input format. Please use a WAV(PCM) file with 16-bit samples, 16 kHz sample rate, and a single channel (Mono).