I looked up and found this - https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/captioning-concepts?pivots=programming-language-javascript
In Caption output format section, it says -
The Speech service supports output formats such as SRT (SubRip Text) and WebVTT (Web Video Text Tracks).
But there is no option to set output format in API reference - https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-0/operations/CreateTranscription
I am using Create Transcription API to send video/audio files > 30 minutes and Azure gives transcription result in JSON like following -
I'm planning to write a script to convert transcription JSON to VTT, but it will be really helpful if that is already there or something I can request as output format.
The speech key needs to be retrieved to make it work. Create the speech service in azure portal and get the supportive python code to convert speech to text.
Get the python code (captioning) to speech to text.
To set the environment:
setx SPEECH_KEY your-key
Create caption from the speech
Go to the same directory where the code was available.
pip install azure-cognitiveservices-speech
Run the application:
python captioning.py --input caption.this.mp4 --format any --output caption.output.txt --srt --realTime --threshold 5 --delay 0 --profanity mask --phrases "Contoso;Jessie;Rehaan"
To check for the SRT format -> Link
We have the duration limit for every service in azure. Check for the quota and support with the link.