I'm trying to make a web scraper that downloads an image that's inside of an iframe with a child.
I can't get Selenium for Chrome to find the correct iframe to switch into. The main issue is the iframe in question doesn't have a name or id so I searched by index. I managed to get inside of the parent, but I can't get inside of the sub-child. If I set the index to 1
I get the next iframe in the outermost scope.
From looking into my webdriver object I think the search is limited to Red Rectangle, as thats what's inside the page source attribute of my var "driver".
The Object I want to reach is the img with the id pbk-page in the Green Rectangle My code so far just gets the url then waits for the page to load using sleep (once I can navigate to the correct element I'll implement WebDriverWait). This is the navigation bit of code:
driver.switch_to.frame(0)
Image_link = driver.find_element(By.ID,'pbk-page')
Oh! I'm using python
I was stuck doing the exact same thing you were (maybe even scraping the same website?), and this is what worked for me:
iframe1 = driver.find_elements(By.XPATH, value="//iframe")[0]
driver.switch_to.frame(iframe1)
iframe2 = driver.execute_script("return document.querySelector(\"body > mosaic-book\").shadowRoot.querySelector(\"iframe\")")
driver.switch_to.frame(iframe2)
img = driver.find_elements(By.ID, value="pbk-page")
I am very much an amateur at using Selenium, but this is my best understanding of how this works: First, we're able to find the parent iframe iframe1
, but our driver can't see anything inside of the shadow DOM. However, we can access inside of the shadow DOM using javascript, so starting from the iframe, we can find the shadow host element mosaic-book
, enter the shadow DOM, and return/pass out the child iframe iframe2
. Then we can switch our driver into this iframe2
and access the image.
There very well might be a more elegant way to do this, but this is what worked for me.