Can we use MP3 audio file in speech to text Watson API ?
What are the popular unsupported formats for speech to text Watson API ?
I suggest you use WAV format, in the case: popular format. Depends the case use.
If you really need to use MP3
, you can simple to convert MP3 to WAV.
But, the formats Speech to Text support is:
audio/flac: Free Lossless Audio Codec (FLAC), a lossless compressed audio coding format. For more information, see en.wikipedia.org/wiki/FLAC.
audio/l16: Linear 16-bit Pulse-Code Modulation (PCM), an uncompressed audio data format. Use this media type to pass a raw PCM file. Note that linear PCM audio can also reside inside a container Waveform Audio File Format (WAV) file. For more information, see the Internet Engineering Task Force (IETF) Request for Comment (RFC) 2586 and en.wikipedia.org/wiki/Pulse-code_modulation.
audio/wav: Waveform Audio File Format (WAV), a standard created by Microsoft® and IBM. A WAV file is a container that is often used for uncompressed audio bitstreams but can contain compressed audio, as well. For more information, see en.wikipedia.org/wiki/WAV. The service supports WAV files that use any encoding. It accepts audio with a maximum of nine channels (due to an FFmpeg limitation).
audio/ogg/ audio/ogg;codecs=opus / audio/ogg; codecs=vorbis: Ogg is a free, open container format maintained by the Xiph.org Foundation; for more information, see www.xiph.org/ogg/. Both codecs are free, open, lossy audio-compression formats. Opus is the preferred codec. If you omit the codec, the service automatically detects it from the input audio.
audio/webm/ audio/webm;codecs=opus/ audio/webm;codecs=vorbis: Web Media (WebM) is an open media-file format; for more information, see webmproject.org. WebM supports audio streams compressed with the Opus and Vorbis audio codecs; Opus is the preferred codec. If you omit the codec, the service automatically detects it from the input audio. For JavaScript code that shows how to capture audio from a microphone in a Chrome browser and encode it into a WebM data stream.
But, all formats with more details you can see in the Speech to Text Official Documentation. I suggest you to edit with more details and read the documentation, commonly, the documentation from IBM is very objective and complete.