pythonselenium-webdriver

Webpage Only Scrolls Once Using Selenium Despite New Content Loading


I'm trying to scrape URLs from a dynamically allocated webpage that requires continuous scrolling to load all the content into the DOM. My approach involves running window.scrollTo(0, document.body.scrollHeight); in a loop using Selenium's execute_script function. After each scroll, I compare the number of URLs loaded before and after the scroll. If the number of URLs doesn't change, I assume the end of the page has been reached and break the loop.

However, the script assumes that all content has been loaded into the DOM, even though I know new content is being loaded within the given timeout. Below is my code:

def _scroll_page_to_bottom(self, timeout: int):  # Todo: Fix Bugs
    while True:
        urls_before_scroll = self.browser.find_elements(
            By.XPATH, read_xpath(self.scrape_programs_urls.__name__, "programs_urls")
        )
        self.browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait for new content to be loaded
        try:
            WebDriverWait(self.browser, timeout).until(
                lambda _: len(self.browser.find_elements(
                    By.XPATH, read_xpath(self.scrape_programs_urls.__name__, "programs_urls"))
                ) > len(urls_before_scroll)
            )
        except TimeoutException:
            # If no new content is loaded within the timeout, assume we've reached the end of the page
            break

Can anyone please guess what could be causing the issue in the above code?

Edit: i did some debugging and found the issue is specifically related to scroll functionality when i execute window.scrollTo(0, document.body.scrollHeight); in the console of the browser the page doesn't get scrolled to the bottom either which explains why my code is not working. The site am trying to scrape is https://hackerone.com/opportunities/all/search


Solution

  • This code below works well in scrolling down the page, try to embed it into your code:

    ele = driver.find_element(By.XPATH, '//div[contains(@class,"Pane-module_u1-pane__content")]')
    driver.execute_script('arguments[0].scrollIntoView(false);', ele)