pythonselenium-webdriverreddit

Selenium cannot find the ID of reddit comments, why?


I've been using selenium to take screenshots of Reddit posts and comments, and I've run into an issue that I can't find a fix for online. My code gives selenium the ID of the object I want to take a screenshot of, and with the main reddit post itself, this works great. When it comes to the comment though, it always times out (when using EC.presence_of_element_located()) or says that it can't find it (when using Driver.findElement()).

Here's the code:

def getScreenshotOfPost(header, ID, url):
    driver = webdriver.Chrome() #Using chrome to define a web driver
    driver.get(url) #Plugs the reddit url into the web driver
    driver.set_window_size(width=400, height=1600)
    wait = WebDriverWait(driver, 30)
    driver.execute_script("window.focus();")
    method = By.ID #ID is what I've found to be the most reliable method of look-up
    handle = f"{header}{ID}" #The header will be of the form "t3_" for posts and "t1_" for comments, and the ID is the ID of the post of comment.

    element = wait.until(EC.presence_of_element_located((method, handle)))
    driver.execute_script("window.focus();")

    fp = open(f'Post_{header}{ID}.png', "wb")
    fp.write(element.screenshot_as_png)
    fp.close()

I've tried searching by ID, CLASS, CSS_SELECTOR, and XPATH, and none of them work. I've double checked and the form t1_{the id of the comment} is the correct ID for the comment, regardless of the reddit post. Increasing the wait-time on my web driver doesn't work. I'm not sure what the issue would be.

Thanks in advance for any help!


Solution

  • I see what the problem is... there are a TON of nested shadow-roots on the page. If you are familiar with IFRAMEs, they behave similarly. Basically you need to switch Selenium's context into the IFRAME/shadow-root for Selenium to be able to see the DOM inside and proceed. You will have to switch into each shadow-root, one at a time, and keep diving until you get to the element you want.

    Some example code,

    def test_recommended_code():
        driver = Chrome()
    
        driver.get('http://watir.com/examples/shadow_dom.html')
    
        shadow_host = driver.find_element(By.CSS_SELECTOR, '#shadow_host')
        shadow_root = shadow_host.shadow_root
        shadow_content = shadow_root.find_element(By.CSS_SELECTOR, '#shadow_content')
    
        assert shadow_content.text == 'some text'
    
        driver.quit()
    

    You can read more about it in this article.