I'm exploring the capabilities of the Whisper API and was wondering if it can be used to generate an .SRT file with transcriptions. From what I understand, this transcription to .SRT can be achieved when running the model locally using the Whisper package. Unfortunately, I don't possess the computational resources to run the model locally, so I'm leaning towards using the API directly.
Has anyone had experience with this or can provide guidance on how to approach it through the API?
The following python script can be used a starting point, but the question is about capabilities of the model itself, not specific to any programming language.
import os
import openai
openai.api_key = API_KEY
audio_file = open("audio.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript.text)
A cursory look at OpenAI's docs shows that srt
is a supported value for the response_format
parameter on the /v1/audio/transcriptions
endpoint.
With the official Python bindings you're using in your example, you should be able to pass this as a named parameter to your openai.Audio.transcribe()
invocation:
transcript = openai.Audio.transcribe("whisper-1", audio_file, response_format="srt")