pythonhtmlselenium-webdriverweb-scrapingiframe

How to Access the Attributes in an iFrame in Python


I'm learning web scraping for data analysis.

I have successfully retreived several elements of interest on this page such as Title, Date, Upvotes etc. https://old.reddit.com/r/JoeRogan/comments/cmxmtc/jre_1330_bernie_sanders/.

I'd like to retreive the original youtube title of the video on this page, however I've been unable to access it. In this case it is title="Joe Rogan Experience #1330 - Bernie Sanders".

enter image description here

I have read and understood that elements within iframes are not directly obtainable, however the Selenium documentation does not cover this case, at least not sufficiently in Python. In this case the iframe has no name variables.

When I run driver.switch_to.frame(1) an error is returned which makes me confused as to what's going on because there are certainly iframe tags in the structure of the page. driver.switch_to.frame(0) works but I assume that's just referring to the primary, default page anyway. I'm not sure what are the ways to identify the different iframe windows.

I have right-clicked on the area and extracted the XPath /html/body/iframe although this looks different than the examples I've seen online and it ultimately did not work when I tried driver.find_element(By.XPATH, "/html/body/iframe"). I have also tried searching by TAG_NAME instead of XPATH.

I guess that the problem may be related to the structure of the iframes that I am missing. Any help on how to extract the title attribute from this iframe would be greatly appreciated.


Solution

  • See the working code below with explanation:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Chrome()
    driver.maximize_window()
    wait = WebDriverWait(driver, 10)
    
    driver.get("https://old.reddit.com/r/JoeRogan/comments/cmxmtc/jre_1330_bernie_sanders/")
    
    # Enter inside the parent IFRAME
    wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "(//iframe)[2]")))
    
    # Store the target IFRAME node web element into variable
    iframe = wait.until(EC.visibility_of_element_located((By.XPATH, "(//iframe)[1]")))
    
    # Print the required attribute value
    print(iframe.get_attribute("title"))
    

    Console output:

    Joe Rogan Experience #1330 - Bernie Sanders
    
    Process finished with exit code 0