Is it possible to turn an SRT
file, which is used for subtitles in videos into a dataset?
When imported into Excel, the SRT
file format looks like this:
1
00:00:03,000 --> 00:00:04,000
OVERLAPS PURE COINCIDENCE THAT
...
This pattern continues as time in the "video"/transcript goes on. I'd like to format the SRT
file this way:
number ; start ; end ; text
1 ; 00:00:03,000 ; 00:00:04,000 ; OVERLAPS PURE COINCIDENCE THAT
The VBA procedure below loads a standard .srt
(SubRip Movie Subtitle File) from a local file and splits it into rows/columns on the active Excel worksheet.
Sub importSRTfromFile(fName As String)
'Loads SRT from local file and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, x As Long
'load file
Open fName For Input As #1
While Not EOF(1)
Line Input #1, sIn
sOut = sOut & sIn & vbLf
Wend
Close #1
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For x = 1 To UBound(sArr)
Range("A" & x) = sArr(x)
Next x
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & fName
End Sub
Example Usage:
Sub test_FileImport()
importSRTfromFile "c:\yourPath\yourFilename.srt"
End Sub
Alternatively, you can import an .srt
(or other similar text files) from a Website URL such as https://subtitle-index.org/ with this:
Sub importSRTfromWeb(url As String)
'Loads SRT from URL and converts to columns in Active Worksheet
Dim sIn As String, sOut As String, sArr() As String, rw As Long
Dim httpData() As Byte, XMLHTTP As Object
'load file from URL
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.send
httpData = XMLHTTP.responseBody
Set XMLHTTP = Nothing
sOut = StrConv(httpData, vbUnicode)
'convert LFs to delimiters & split into array
sOut = Replace(sOut, vbLf & vbLf, vbCr)
sOut = Replace(Replace(sOut, vbLf, "|"), " --> ", "|")
sArr = Split(sOut, vbCr)
'check if activesheet is blank
If ActiveSheet.UsedRange.Cells.Count > 1 Then
If MsgBox(UBound(sArr) & " rows found." & vbLf & vbLf & _
"Okay to clear worksheet '" & ActiveSheet.Name & "'?", _
vbOKCancel, "Delete Existing Data?") <> vbOK Then Exit Sub
ActiveSheet.Cells.ClearContents
End If
'breakout into rows
For rw = 1 To UBound(sArr)
Range("A" & rw) = sArr(rw)
Next rw
'split into columns
Columns("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, Other:=True, OtherChar:="|"
MsgBox "Imported " & UBound(sArr) & " rows from:" & vbLf & url
End Sub
Sub testImport()
importSRTfromWeb _
"https://subtitle-index.org/download/4670541854528212663953859964/SRT/Pulp+Fiction"
End Sub
Many sites host free .srt
's; you may have to right-click the download button to copy the link (which may have an .srt
extension or might be a pointer, like the example above). The procedure won't work on .zip
'd files.
Wikipedia : SubRip & SRT
MSDN : Split Function (VBA)
Wikipedia : Newline characters
MSDN : UBound Function