I'm trying to do the following with an SRT (subtitles) file:
I have to do that on the dataframe dfClean
with the edited timestamp fields and then do the same to the dataframe with the original SRT time format dfSRTForm
so I can export the latter later as an SRT file.
My code to do that is this:
for i in dfClean.index:
while dfClean.at[i, 'Difference'] < 5:
dfClean.at[i, 'Text'] = dfClean.at[i, 'Text'] + ' ' + dfClean.at[i+1, 'Text']
dfSRTForm.at[i, 'Text'] = dfSRTForm.at[i, 'Text'] + ' ' + dfSRTForm.at[i+1, 'Text']
dfClean.at[i, 'End_Time'] = dfClean.at[i+1, 'End_Time']
dfSRTForm.at[i, 'End_Time'] = dfSRTForm.at[i+1, 'End_Time']
dfClean = dfClean.drop(i+1)
dfSRTForm = dfSRTForm.drop(i+1)
But I get this error:
KeyError: 3
UPDATE (keeping previous if anyone else is having the same issue):
I found a way to reset the index to avoid KeyError: 3
My current code is:
for i in dfClean.index:
while dfClean.at[i, 'Difference'] < 5:
dfClean.at[i, 'Text'] = dfClean.at[i, 'Text'] + ' ' + dfClean.at[i+1, 'Text']
dfSRTForm.at[i, 'Text'] = dfSRTForm.at[i, 'Text'] + ' ' + dfSRTForm.at[i+1, 'Text']
dfClean.at[i, 'End_Time'] = dfClean.at[i+1, 'End_Time']
dfSRTForm.at[i, 'End_Time'] = dfSRTForm.at[i+1, 'End_Time']
dfClean = dfClean.drop(i+1)
dfSRTForm = dfSRTForm.drop(i+1)
dfClean = dfClean.reset_index()
dfClean = dfClean.drop(columns='index')
dfSRTForm = dfSRTForm.reset_index()
dfSRTForm = dfSRTForm.drop(columns='index')
dfClean['Difference'] = (dfClean['End_Time'] - dfClean['Start_Time']).astype('timedelta64[s]')
But I get KeyError: 267
and I'm pretty sure it's because it condenses the rows to 266.
Is there a way to put "or end of index" or "or last row" in the while loop without hard coding the 266 lines? I want to use it for other SRT files with different varying number of rows.
This is how I ended up fixing it:
indexKeep = len(dfClean.index)
minSec = 3 # min number of seconds of screen time per line of subtitles.
for i in range(0, indexKeep):
try:
while dfClean.at[i, 'Difference'] < minSec:
dfClean.at[i, 'Text'] = dfClean.at[i, 'Text'] + ' ' + dfClean.at[i+1, 'Text']
dfSRTForm.at[i, 'Text'] = dfSRTForm.at[i, 'Text'] + ' ' + dfSRTForm.at[i+1, 'Text']
dfClean.at[i, 'End_Time'] = dfClean.at[i+1, 'End_Time']
dfSRTForm.at[i, 'End_Time'] = dfSRTForm.at[i+1, 'End_Time']
dfClean = dfClean.drop(i+1)
dfSRTForm = dfSRTForm.drop(i+1)
dfClean = dfClean.reset_index()
dfClean = dfClean.drop(columns='index')
dfSRTForm = dfSRTForm.reset_index()
dfSRTForm = dfSRTForm.drop(columns='index')
dfClean['Difference'] = (dfClean['End_Time']-dfClean['Start_Time']).astype('timedelta64[s]')
dfClean.at[i, 'ID'] = i+1
dfSRTForm.at[i, 'ID'] = i+1
indexKeep = len(dfClean.index)
except KeyError: # Takes care of condensed number of rows
pass
This deletes the next row, resets the index numbers so you don't get stuck on KeyError in the middle, and then takes care of the KeyError at the end. The one at the end is a result of initializing the for loop to go for over 800 lines but the condensation that the for loop does makes the total about to 400 lines, which means it eventually can't find "401" when it gets there.