pythonseleniumselenium-webdriverwebdriverpageloadstrategy

Don't wait for a page to load using Selenium in Python


How do I make selenium click on elements and scrape data before the page has fully loaded? My internet connection is quite terrible so it sometimes takes forever to load the page entirely, is there anyway around this?


Solution

  • Update with (7 July 2023)

    page_load_strategy

    page_load_strategy is now an attribute. So the minimal code block to configure page_load_strategy with Selenium v 4.6 and above is as follows:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    
    options = Options()
    # options.page_load_strategy = 'none'
    options.page_load_strategy = 'eager'
    # options.page_load_strategy = 'normal'
    driver = webdriver.Chrome(options=options)
    driver.get("https://google.com")
    

    ChromeDriver 77.0 (which supports Chrome version 77) now supports eager as pageLoadStrategy.

    Resolved issue 1902: Support eager page load strategy [Pri-2]


    As you question mentions of click on elements and scrape data before the page has fully loaded in this case we can take help of an attribute pageLoadStrategy. When Selenium loads a page/url by default it follows a default configuration with pageLoadStrategy set to normal. Selenium can start executing the next line of code from different Document readiness state. Currently Selenium supports 3 different Document readiness state which we can configure through the pageLoadStrategy as follows:

    1. none (undefined)
    2. eager (page becomes interactive)
    3. normal (complete page load)

    Here is the code block to configure the pageLoadStrategy:

    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    
    binary = r'C:\Program Files\Mozilla Firefox\firefox.exe'
    caps = DesiredCapabilities().FIREFOX
    # caps["pageLoadStrategy"] = "normal"  #  complete
    caps["pageLoadStrategy"] = "eager"  #  interactive
    # caps["pageLoadStrategy"] = "none"   #  undefined
    driver = webdriver.Firefox(capabilities=caps, firefox_binary=binary, executable_path="C:\\Utility\\BrowserDrivers\\geckodriver.exe")
    driver.get("https://google.com")