I am trying to fetch tag url-content from the HTML code of site https://95jun.kinoxor.pro/984-univer-13-let-spustja-2024-07-06-19-54.html
The site is triky. You can open it from this page (the first/second result of the search engine https://yandex.ru/search/?text=https%3A%2F%2Fkinokubok.pro%2F232-univer-13-let-spustja-2024-06-25-19-51.html&lr=21653
I am looking for this URL: <iframe src="https://api.stiven-king.com/storage.html" ...
How can I fetch html tag's content?
My code:
import seleniumwire.undetected_chromedriver as uc
import time
options = uc.ChromeOptions()
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
driver = uc.Chrome(options=options)
def interceptor(request):
del request.headers['Referer']
request.headers['Referer'] = 'https://yandex.ru/'
url = "https://125jun.kinoamor.pro/251-univer-13-let-spustja-2024-06-27-19-51.html"
driver.request_interceptor = interceptor
driver.get(url)
time.sleep(3)
iframe_tag_elements = driver.find_elements("xpath", "//iframe")
print(f"FOUND VIDEO TAGS: {len(iframe_tag_elements)}") # prints 7
for iframe_elem in iframe_tag_elements:
video_url = iframe_elem.get_attribute("src")
if video_url:
print("XXX_ ", video_url)
**PROBLEM ** - URL "https://api.stiven-king.com/storage.html" is not printed
Also I don't see the URL the the driver.page_source
I was trying to sleep, to scroll page but it didn't help
Also was. trying to driver.switch_to.frame(iframe_elem)
and the was serching for iframes againg
As suggested in the other answers you need to switch to the <iframe>
containing the link you are looking for. But instead of looking for the first <iframe>
you can provide more specific locator
# replaced the url
url = "https://01jul.kinokubok.pro/232-univer-13-let-spustja-2024-07-03-20-19.html"
driver.request_interceptor = interceptor
driver.get(url)
WebDriverWait(driver, 30).until(ec.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "#dle-content .video-box > iframe:not([src])")))
iframe_tag_elements = driver.find_elements("xpath", "//iframe")
print(f"FOUND VIDEO TAGS: {len(iframe_tag_elements)}")
for iframe_elem in iframe_tag_elements:
video_url = iframe_elem.get_attribute("src")
if video_url:
print("XXX_ ", video_url)
Output
FOUND VIDEO TAGS: 1
XXX_ https://api.stiven-king.com/storage.html
If you want all the <iframe>
s values you can build a recursive function to extract it.
To make the page loading faster you can set page_load_strategy
to 'eager'
, but be aware you might have to add some wait if it's too fast
Complete code
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import seleniumwire.undetected_chromedriver as uc
options = uc.ChromeOptions()
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
options.page_load_strategy = 'eager'
driver = uc.Chrome(options=options)
def interceptor(request):
del request.headers['Referer']
request.headers['Referer'] = 'https://yandex.ru/'
def get_frame_data(frames):
src = []
for frame in frames:
video_url = frame.get_attribute("src")
if video_url:
src.append(video_url)
driver.switch_to.frame(frame)
child_frames = driver.find_elements("xpath", "//iframe")
if child_frames:
src.extend(get_frame_data(child_frames))
driver.switch_to.default_content()
return src
url = "https://01jul.kinokubok.pro/232-univer-13-let-spustja-2024-07-03-20-19.html"
driver.request_interceptor = interceptor
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(ec.visibility_of_element_located(("id", "grid")))
wait.until(ec.visibility_of_element_located(("class name", "karusel")))
iframe_tag_elements = driver.find_elements("xpath", "//iframe")
all_src = get_frame_data(iframe_tag_elements)
for sr in all_src:
print("XXX_ ", sr)
Output 1:
XXX_ https://api.marts.ws/embed/movie/74360
XXX_ https://loosening-as.allarknow.online/?token_movie=be2b9578d8cae35323bb199f888be1&token=b5c08f668c592ee23d32031d27de44
XXX_ https://www.youtube.com/embed/mthO33phh9U
XXX_ https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.7255632935282506
XXX_ https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.027842698862642123
Output 2:
XXX_ https://api.stiven-king.com/storage.html
XXX_ https://loosening-as.allarknow.online/?token_movie=be2b9578d8cae35323bb199f888be1&token=b5c08f668c592ee23d32031d27de44
XXX_ https://www.youtube.com/embed/mthO33phh9U
XXX_ https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.9638742292394189
XXX_ https://yastatic.net/share2/v-1.16.0/frame.html?namespace=ya-share2.0.8570699995377056