azureazure-cognitive-servicesmicrosoft-translatormicrosoft-speech-api

Microsoft Translator Speech missing punctuation


I am using MS Translator Speech WebSocket API for real-time speech recognition and translation. The problem is that sometimes the recognised text does not have punctuation (commas, full stops, etc.). The transcribed text looks good otherwise. I also receive an MP3 with synthesised translation.

It looks completely random, I can send the same audio multiple times and some responses have punctuation and some do not. I am sending the audio in correct format and in near real-time rate e.g. I send 100ms samples every ~100ms. The recognised language is Spanish.

Is this a common issue or is there some other catch?


Solution

  • Switching to the Speech Preview API solved the missing punctuation. For now there are SDK's only and the raw WebSocket API is not yet documented. I have managed to connect to and use the WS API, more info in another SO question.