Note that this question is different from How do we download a blob url video [closed] in that it requires no human interaction with the browser.
I have the following problem:
What I need to do:
Note that the solution would require no human interaction with the browser. API-wise, the input should be a list of URLs and the output a list of videos/gifs.
An example page can be found here in case you want to test your solution.
My understanding is that I can use Selene to get the HTML and click on the image to start the player. However, I have no idea how to process the blob to get the m3u8 and then use that one for the actual video.
With a little digging, you don't need to click any buttons. When you click the buttons it calls for the master.m3u8 file. Using dev tools you can piece together the requested url. The thing is, that first file doesn't contain the links to the actual video. You piece together another request to get the final m3u8 file. From there, you can use the other SO links to download the video. It's segmented so it's not straightforward download. You can uncomment the print statements below see what each m3u8 file contains. This will loop through the pages as well
import re
for i in range(6119, 6121):
url = 'https://www2.nhk.or.jp/signlanguage/sp/enquete.cgi?dno={}'.format(str(i))
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.find(onclick=re.compile('signlanguage/movie'))) # locate the div that has the data we need
video_id = soup.find(onclick=re.compile('signlanguage/movie')).get('onclick').split(',')[1].replace("'","")
m3u8_url = 'https://nhks-vh.akamaihd.net/i/signlanguage/movie/v4/{}/{}.mp4/master.m3u8'.format(video_id[-1], video_id)
# this m3u8 file doesn't contain download links, the next one does; so download and save that one
r = requests.get(m3u8_url)
# print(r.text)
m3u8_url_2 = r.text.split('\n')[2] # get first link; high bandwidth
r2 = requests.get(m3u8_url_2)
# print(r2.text)
# there are other ways to download the file, i'm just creating a new one with the data read and writing to a file
fn = video_id + '.m3u8'
with open(fn, 'w+') as f:
f.write(r2.text)
f.close()