pythonweb-scrapingpython-requestsm3u8

Download m3u8 from URL using Python


I started learning web scraping with Python. Currently, I would like to download a video of the Japanese Diet. (https://www.shugiintv.go.jp/jp/index.php?ex=VL&deli_id=40124&media_type=)

The video seems to have a mechanism to call chunklist.m3u8 from playlist.m3u8 and then call the ts files described in chunklist.m3u8 in order.

I want to download the contents from the playlist.m3u8 URL first, then call chunklist.m3u8 to download the ts files in order and concat.

However, I tried to download Playlist.m3u8, but it didn't produce the text I expected.

Also, the sample URL of playlist.m3u8 is here↓

http://hlsvod.shugiintv.go.jp/vod/_definst_/amlst:2011/2011-1207-0900-12/playlist.m3u8

code:

import requests

url = "http://hlsvod.shugiintv.go.jp/vod/_definst_/amlst:2011/2011-1207-0900-12/playlist.m3u8"
res = requests.get(url)
print(res.text)

excepted text:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=564000,NAME="500k",RESOLUTION=640x360
chunklist_w60346572_b564000_t64NTAwaw==.m3u8

actual text:

<html><head><title>Wowza Streaming Engine 4 Perpetual Bundle Unlimited Edition 4.7.7 build20181108145350</title></head><body>Wowza Streaming Engine 4 Perpetual Bundle Unlimited Edition 4.7.7 build20181108145350</body></html>

I think there is a problem with the colon in the URL, but I don't have a clear solution. I would like to know how to avoid URL issues and successfully download the text in playlist.m3u8. Thanks.

Version:

Python 3.7.9

requests 2.25.1


Solution

  • Something is wrong with your url:

    >>> url = "http://hlsvod.shugiintv.go.jp/vod/_definst_/amlst:2011/2011-1207-0900-12/playlist.m3u8"
    >>> res = requests.get(url)
    >>> res.request.url
    'https://hlsvod.shugiintv.go.jp/vod/_definst_/amlst:2011/2011-1207-0900-12/playlist.m3u8%20'
    

    See the "%20" in the end?

    I am not really sure how you got it wrong, but copy-paste this should work:

    url = 'https://hlsvod.shugiintv.go.jp/vod/_definst_/amlst:2011/2011-1207-0900-12/playlist.m3u8'