I'm learning web scraping for data analysis.
I have successfully retreived several elements of interest on this page such as Title, Date, Upvotes etc. https://old.reddit.com/r/JoeRogan/comments/cmxmtc/jre_1330_bernie_sanders/.
I'd like to retreive the original youtube title of the video on this page, however I've been unable to access it. In this case it is title="Joe Rogan Experience #1330 - Bernie Sanders"
.
I have read and understood that elements within iframes are not directly obtainable, however the Selenium documentation does not cover this case, at least not sufficiently in Python. In this case the iframe has no name
variables.
When I run driver.switch_to.frame(1)
an error is returned which makes me confused as to what's going on because there are certainly iframe tags in the structure of the page. driver.switch_to.frame(0)
works but I assume that's just referring to the primary, default page anyway. I'm not sure what are the ways to identify the different iframe windows.
I have right-clicked on the area and extracted the XPath /html/body/iframe
although this looks different than the examples I've seen online and it ultimately did not work when I tried driver.find_element(By.XPATH, "/html/body/iframe")
. I have also tried searching by TAG_NAME instead of XPATH.
I guess that the problem may be related to the structure of the iframes that I am missing. Any help on how to extract the title attribute from this iframe would be greatly appreciated.
See the working code below with explanation:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.maximize_window()
wait = WebDriverWait(driver, 10)
driver.get("https://old.reddit.com/r/JoeRogan/comments/cmxmtc/jre_1330_bernie_sanders/")
# Enter inside the parent IFRAME
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, "(//iframe)[2]")))
# Store the target IFRAME node web element into variable
iframe = wait.until(EC.visibility_of_element_located((By.XPATH, "(//iframe)[1]")))
# Print the required attribute value
print(iframe.get_attribute("title"))
Console output:
Joe Rogan Experience #1330 - Bernie Sanders
Process finished with exit code 0