regexnotepad++srt

regex insert character into blank SRT


I have a blank SRT file associated with a video, the timecodes have already been set in a transcription software platform (i.e. the boundaries of each caption have been set but the captions have not been written down), I have uploaded the video to youtube, and now I want to the blank SRT file to it, so someone can transcribe it using YouTube's transcription/translation platform.

Blank SRT:

1
00:00:01,05 --> 00:00:04,64


2
00:00:05,02 --> 00:00:07,18


3
00:00:07,81 --> 00:00:11,03


4
00:00:11,04 --> 00:00:15,92


5
00:00:16,35 --> 00:00:17,11

But there is a problem, since there is no text in the captions, youtube does not recognize the timecodes, and so nothing happens when the srt is uploaded to a video in youtube.

To get around this, I place a single non-alphanumeric character in the blank line beneath each time code (usually a "-").

SRT file with dashes:

1
00:00:01,05 --> 00:00:04,64
-

2
00:00:05,02 --> 00:00:07,18
-

3
00:00:07,81 --> 00:00:11,03
-

4
00:00:11,04 --> 00:00:15,92
-

5
00:00:16,35 --> 00:00:17,11
-

This is very manual process that can take a long time. There has to be a way to use a find and replace in something like Notepad++ and simply add the dashes. I'm trying to do that using regex but running into problems.

^$ correctly targets the blank lines, however if I simply replace them with a "-" I get:

1
00:00:01,05 --> 00:00:04,64
-
-
2
00:00:05,02 --> 00:00:07,18
-
-

This is unacceptable since it breaks the syntax of the SRT, when uploading an SRT like this to youtube, it thinks the content is a single caption. Thus I need to place a dash ONLY in the first blank line, the one that is directly beneath the timecode.

I cannot figure out how to ONLY select the first blank line in each pair of blank lines. Any solutions would be appreciated.


Solution

  • You could match the specific format at the end of the line followed by a newline and assert directly the end of the string, and then replace with the full match and -

    Find what

    -->\h+\d\d:\d\d:\d\d,\d+\R$
    

    Replace with

    $0-
    

    Regex demo

    enter image description here

    Or the shorted variant matching only the comma, digits and a newline followed by an anchor:

    ,\d+\R$