regextextsedsubtitlewebvtt

Match timestamps in WebVTT files with sed


I have the following PCRE2 regex that works to match and remove timestamp lines in a .webVTT subtitle file (the default for YouTube):

^[0-9].:[0-9].:[0-9].+$

This changes this:

00:00:00.126 --> 00:00:10.058
How are you today?

00:00:10.309 --> 00:00:19.272
Not bad, you?

00:00:19.559 --> 00:00:29.365
Been better.

To this:

How are you today?

Not bad, you?

Been better.

How would I convert this PCRE2 regex to an idiomatic (read: sane-looking) equivalent for sed's flavour of regex?


Solution

  • Using your regex with sed

    $ sed -En '/^[0-9].:[0-9].:[0-9].+$/!p' file
    How are you today?
    
    Not bad, you?
    
    Been better.
    

    Or, do not match lines that end with an integer

    $ sed  -n '/[0-9]$/!p' file
    How are you today?
    
    Not bad, you?
    
    Been better.