javascriptpythonpython-3.xplaywright

Playwright auto-scroll to bottom of infinite-scroll page


I am trying to automate the scraping of a site with "infinite scroll" with Python and Playwright.

The issue is that Playwright doesn't include, as of yet, a scroll functionnality let alone an infinite auto-scroll functionnality.

From what I found on the net and my personnal testing, I can automate an infinite or finite scroll using the page.evaluate() function and some Javascript code.

For example, this works:

for i in range(20):
    page.evaluate('var div = document.getElementsByClassName("comment-container")[0];div.scrollTop = div.scrollHeight')
    page.wait_for_timeout(500)

The problem with this approach is that it will either work by specifying a number of scrolls or by telling it to keep going forever with a while True loop.

I need to find a way to tell it to keep scrolling until the final content loads.

This is the Javascript that I am currently trying in page.evaluate():

var intervalID = setInterval(function() {
    var scrollingElement = (document.scrollingElement || document.body);
    scrollingElement.scrollTop = scrollingElement.scrollHeight;
    console.log('fail')
}, 1000);
var anotherID = setInterval(function() {
    if ((window.innerHeight + window.scrollY) >= document.body.offsetHeight) {
        clearInterval(intervalID);
    }}, 1000)

This does not work either in my firefox browser or in the Playwright firefox browser. It returns immediately and doesn't execute the code in intervals.

I would be grateful if someone could tell me how I can, using Playwright, create an auto-scroll function that will detect and stop when it reaches the bottom of a dynamically loading webpage.


Solution

  • The new Playwright version has a scroll function. it's called mouse.wheel(x, y). In the below code, we'll be attempting to scroll through youtube.com which has an "infinite scroll":

    from playwright.sync_api import Playwright, sync_playwright
    import time
    
    
    def run(playwright: Playwright) -> None:
        browser = playwright.chromium.launch(headless=False)
        context = browser.new_context()
    
        # Open new page
        page = context.new_page()
    
        page.goto('https://www.youtube.com/')
    
        # page.mouse.wheel(horizontally, vertically(positive is 
        # scrolling down, negative is scrolling up)
        for i in range(5): #make the range as long as needed
            page.mouse.wheel(0, 15000)
            time.sleep(2)
            
        
        time.sleep(15)
        # ---------------------
        context.close()
        browser.close()
    
    
    with sync_playwright() as playwright:
        run(playwright)