pythonselenium-webdriverweb-scrapingbeautifulsouphtml5lib

How to get iframe source from page_source


Hello I try to extract the link from page_source and my code is:

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import html5lib

driver_path = r"C:\Users\666\Desktop\New folder (8)\chromedriver.exe"
driver = webdriver.Chrome(driver_path)
driver.implicitly_wait(10)

driver.get("https://www.milversite.club/milver/outsiders-1x01-video_060893d7a.html")
try:
    time.sleep(4)
    iframe = driver.find_elements_by_tag_name('iframe')
    for i in range(0, len(iframe)):
        f = driver.find_elements_by_tag_name('iframe')[i]
        driver.switch_to.frame(i)
        #  your work to extract link
        text = driver.find_element_by_tag_name('body').text
        print(text)
        driver.switch_to.default_content()

    output = driver.page_source

    print (output)

finally:
    driver.quit();

And now I want to scrape just this link LINK


Solution

  • Try the below script to get the link you wanna parse. You didn't need to switch to iframe to get the link. Hardcoded delay is always the worst choice to parse any dynamic content. What if the link apprears after 5 seconds. I used Explicit Wait within the below script to make it robust.

    from selenium import webdriver
    from selenium.webdriver.support import ui
    
    driver = webdriver.Chrome()
    wait = ui.WebDriverWait(driver, 10)
    driver.get("https://www.milversite.club/milver/outsiders-1x01-video_060893d7a.html")
    
    elem = wait.until(lambda driver: driver.find_element_by_id("iframevideo"))
    print(elem.get_attribute("src"))
    
    driver.quit()
    

    Output:

    https://openload.co/embed/8wVwFQEP1Sw