I intend to create an audio recognizer and take it to transform it into subtitle.srt, but I need to format it to look like a .srt subtitle file.
For example, I have this code that tries to recognize .mp4 audio previously converted to .wav format:
import speech_recognition as sr
r = sr.Recognizer()
intro = sr.AudioFile('intro.wav')
with intro as source:
r.adjust_for_ambient_noise(source)
audio = r.record(source)
result = r.recognize_google(audio)
print(result)
Result returned from audio recognition:
this is chapter 1 of the flask Mega tutorial welcome before we begin I want to spend a couple of minutes showing you how you can work with the core repository on GitHub while you do this tutorial...
What I still don't know how to do is know how to get the pause points and manipulate them to leave this text as a subtitle file like this:
1
00:00:01,510 --> 00:00:05,860
This is Chapter 1 of the flask make a tutorial welcome.
2
00:00:05,860 --> 00:00:11,270
Before we begin I want to spend a couple of minutes showing you how you can work.
Hi Oliveria, with the Python library you're using, what you're asking is kind of involved. As I've been asked to not provide a full implementation, I won't, but here's where you need to look.
The library defines a layer of abstraction over the actual google API for speech recognition. Altought the API serves a JSON file upon request that does contain the timestamp data for each word, the recognize_google
method -by default- discards this information, and only returns the transcript
the full signature of the method:
def recognize_google(self, audio_data, key=None, language="en-US", pfilter=0, show_all=False, with_confidence=False):
Happilly you can setup the "show_all"
flag to retrieve the full JSON as a dict
result = r.recognize_google(audio_data = audio, show_all = True)
That being said, note that know you face a new challenge: You have to extract the transcript from the dict, and some how compose timecodes from the individual words
but if you wish to pursue it anyway, some of the code writen below (in the previous iteration of this answer) can be repourpoused to acomplish part of that challenge.
If you could provide a sample of the sorce text you are trying to format it would be easier to provide a proper solution.
However, here's an explanation of an approach you could take.
Piece by piece the program you're asking for should:
str
) and its time code.str
based on that and format it acordding to the SubRip (srt) format guide
i.e.1
00:02:16,612 --> 00:02:19,376
some text
2
00:02:19,482 --> 00:02:21,609
some other text
etc....
and
*.srt
fileLuckily for you all of those can be acomplished in native Python without much hussle. To keep it simple I'll offer a procedural function-based approach.
What we need is a function
that takes the text (str
) for each section of the subtitles, along with it's time frame, and appends to a file the formated subs. We can achive this with Python's built-in context managers`` some
for loops` and string formating.
As it stands, we would need some way knowing witch "raw text" entry corresponds to witch section of the srt file, probably some encapsulation would prove useful but being a simple program we can do with a global variable, as the shared state is minimal.
As no further info was provided, I'll assume you can create a ordered list of tuples that contain: the "raw text" you wish to format, the intial time code and the duration in miliseconds for each section.
def main():
#define globals
RAW_TEXT_LIST : List[tuple[str,str,int]] # you should assing to RAW_TEXT_LIST each section of the text
# RAW_TEX_LIST = # right here
NAME_OF_FILE : str = "THE_NAME_OF_THE_SUB_FILE_TO_WRITE" #change this
currentSection : int = 0
def convertMillisToTc(millis: int) -> str:
#utility function to convert miliseconds to timeCode hh:mm:ss,mmm
miliseconds,seconds=divmod(int(millis/1000),60)
minutes=int(millis/(1000*60))%60
hours=int(millis/(1000*60*60))%24
return f"{hours:02d}:{minutes:02d}:{seconds:02d},{miliseconds:03d}"
def makeSubRipStr(rawText : str, initialTimeCode: str, durationInMiliseconds : int ) -> str:
currentSection+=1 # we add 1 to the currentSection counter, starting in 1.
initialTimeCodeInMilis : int = sum((3600000 * int(hours), 60000 * int(minutes),1000 * int(seconds), int(miliseconds)) for hours,minutes,seconds,miliseconds in initialTimeCode.split(":"))
finalTimeCode : str = convertMillisToTC(initialTimeCodeInMilis + durationInMiliseconds);
formatedText : str = f'{currentSection}\n{initialTimeCodetimeCode} --> {finalTimeCode}\n{rawText}\n\n'
return formatedText
#Create the file and do nothing with it
with open(file=f"./subfiles/{NAME_OF_FILE}.srt",mode="w",encoding="utf-8") as subFile: pass
#open the file in "append mode and add each entry formated"
with open(file=f"./subfiles/{NAME_OF_FILE}.srt",mode="a+",encoding="utf-8") as subFile: pass
for sourceTuple in RAW_TEX_LIST:
text, initialTC, duration = sourceTuple
subFile.write(makeSubRipStr(text,initialTC,duration))
if __name__ == '__main__':
main()