phpm3u8m3u

Need to modify all URLs in an m3u8 file with PHP


I have an m3u8 file which goes something like this:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:15084
#EXT-X-TARGETDURATION:4
#EXTINF:4.004,
radio002live-15084.ts
#EXTINF:4.004,
radio002live-15085.ts
(and so on)

What I ideally want to happen is to have all of those file names prefixed with a URL, but only if they don't start with HTTP(S) already. Then URL encode those, add another thing in front of them, and then return that so ideally the file looks like this:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:15084
#EXT-X-TARGETDURATION:4
#EXTINF:4.004,
proxy.php?http%3A%2F%2Fsomeurl.tld%2Fpath%2Fradio002live-15084.ts
#EXTINF:4.004,
proxy.php?http%3A%2F%2Fsomeurl.tld%2Fpath%2Fradio002live-15085.ts
(and so on)

So far, I've tried turning the thing into an array (one line = one key) but I realized this might not cover every m3u8 I need to parse (what if one has the URL on the same line?) and I can't seem to get past detecting what doesn't start with what regardless.

Ideally, this should work in PHP 5-ish.


Solution

  • $contents = file_get_contents('/tmp/m3u8');
    function httpize($matches)
    {
        if(preg_match('@(?:^|[?=])https?[:%]@', $matches[0])) return $matches[0];
        return 'proxy.php?http%3A%2F%2Fsomeurl.tld%2Fpath%2F'.urlencode($matches[0]);
    }
    echo preg_replace_callback('@^[^#].*$@m', 'httpize', $contents);
    

    The main entry point is preg_replace_callback(), called with parameters:

    So what does happen when preg_replace_callback() finds a match?
    It calls httpize, passing it a parameter containing the found string, wrapped in an array (array(0 => 'radio002live-15084.ts')). So to get the string found, we'll access $matches[0].
    Now we're in charge of returning the replacement for what we received as a parameter (remember we're the callback for preg_replace_callback()? It's waiting for us to return).

    We start with an if(preg_match('@(?:^|[?=])https?[:%]@', $matches[0])) return $matches[0];.
    preg_match() will try to find in $matches[0] a string corresponding to the regular expression (?:^|[?=])https?[:%] (noticed the @…@ had no m modifier? Of course: we're now in a function that received a monoline string, so no need of m).

    Thus this preg_match() will return true if it finds (in the line it received from the preg_replace_callback()) an http or https, either at the start of the line or preceded by an ? or an =, and followed by either an : or an % (the start of an URL-encoded :).

    If it finds it, it means the file has already been wrapped into an URL. So return $matches[0]; without a change.

    On the other hand, if preg_match() returns false (and the if doesn't enter the return), our httpize() function will transform the received string by urlencode()ing it, and preceding it by a fixed URL prefix.

    Demo

    with input:

    #EXTM3U
    #EXT-X-VERSION:3
    #EXT-X-MEDIA-SEQUENCE:15084
    #EXT-X-TARGETDURATION:4
    #EXTINF:4.004,
    radio002live-15084.ts
    #EXTINF:4.004,
    radio002live-15085.ts
    #EXTINF:4.004,
    thisoneisalreadyon?url=http%3A%2F%2Fsomeurl.tld%2Fradio002live-15085.ts
    http://radio002live-15085.ts
    httpradio002live-15085.ts
    with spaces.ts
    

    will return (with PHP 5.6.25):

    #EXTM3U
    #EXT-X-VERSION:3
    #EXT-X-MEDIA-SEQUENCE:15084
    #EXT-X-TARGETDURATION:4
    #EXTINF:4.004,
    proxy.php?http%3A%2F%2Fsomeurl.tld%2Fpath%2Fradio002live-15084.ts
    #EXTINF:4.004,
    proxy.php?http%3A%2F%2Fsomeurl.tld%2Fpath%2Fradio002live-15085.ts
    #EXTINF:4.004,
    thisoneisalreadyon?url=http%3A%2F%2Fsomeurl.tld%2Fradio002live-15085.ts
    http://radio002live-15085.ts
    proxy.php?http%3A%2F%2Fsomeurl.tld%2Fpath%2Fhttpradio002live-15085.ts
    proxy.php?http%3A%2F%2Fsomeurl.tld%2Fpath%2Fwith+spaces.ts