Hello I try to extract the link from page_source and my code is:
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import html5lib
driver_path = r"C:\Users\666\Desktop\New folder (8)\chromedriver.exe"
driver = webdriver.Chrome(driver_path)
driver.implicitly_wait(10)
driver.get("https://www.milversite.club/milver/outsiders-1x01-video_060893d7a.html")
try:
time.sleep(4)
iframe = driver.find_elements_by_tag_name('iframe')
for i in range(0, len(iframe)):
f = driver.find_elements_by_tag_name('iframe')[i]
driver.switch_to.frame(i)
# your work to extract link
text = driver.find_element_by_tag_name('body').text
print(text)
driver.switch_to.default_content()
output = driver.page_source
print (output)
finally:
driver.quit();
And now I want to scrape just this link
Try the below script to get the link you wanna parse. You didn't need to switch to iframe to get the link. Hardcoded delay is always the worst choice to parse any dynamic content. What if the link apprears after 5 seconds. I used Explicit Wait
within the below script to make it robust.
from selenium import webdriver
from selenium.webdriver.support import ui
driver = webdriver.Chrome()
wait = ui.WebDriverWait(driver, 10)
driver.get("https://www.milversite.club/milver/outsiders-1x01-video_060893d7a.html")
elem = wait.until(lambda driver: driver.find_element_by_id("iframevideo"))
print(elem.get_attribute("src"))
driver.quit()
Output:
https://openload.co/embed/8wVwFQEP1Sw