I'm attempting to grab episode title shown at the header of this website. When inspecting the page elements myself I can see near the top a line of HTML like this:
<h1 id="epName">...</h1>
which when clicking on the ellipses opens to
<h1 id="epName">Friendship is Magic, Part 1</h1>
I've attempted to automate this so I can save the corresponding episodes as their actual title as opposed to a season-episode code I'm currently using
I've tried basic request calling
url ='https://fim.heartshine.gay/?s=1&e=1&res=480&lo=0'
x = requests.get(url)
text = x.text
print(text)
but the important result of that was
</head>
<body onload="initPage();">
<h1 id="epName"></h1> <div>
with no actual info between the h1 tags.
I've also tried Selenium as I've guessed this might be a JavaScript enabled function:
from selenium import webdriver
driver = webdriver.Safari()
driver.get("https://g1.heartshine.gay/?s=1&e=46&res=480")
print(dir(driver))
driver.execute_script('changeEp') #this button controls the resulting epName
p_element = driver.page_source
print(p_element)
but again I get the same relevant output from above
You don't need selenium
here, as the data is fetched dynamically from this JSON file. You can use requests.get(url).json
:
import requests
url = 'https://fim.heartshine.gay/db.json'
data = requests.get(url).json()
On how you locate such a source, see e.g. here. The fetching is done here.
The title for season 1 (s=1
), episode 1 (e=1
) would be:
data['series']['seasons'][0]['episodes'][0]['epTitle']
# 'Friendship is Magic, Part 1'
But it might be useful to store the data in a pd.DataFrame
. E.g., using pd.json_normalize
, you could do something like:
import pandas as pd
seasons = data['series']['seasons']
cols = ['seasNum', 'epNum', 'epTitle']
df = (pd.json_normalize(seasons,
record_path='episodes',
meta=['seasNum'])
[cols]
)
Output (reading head
and tail
with np.r_
):
import numpy as np
df.iloc[np.r_[0:2, -2:0]]
seasNum epNum epTitle
0 1 1 Friendship is Magic, Part 1
1 1 2 Friendship is Magic, Part 2
260 14 22 Hat in the Way
261 14 23 Pony Life - New Series!