pythongoogle-chromeselenium-webdriverxpath

Selenium Webdriver for Chrome slows down with each loop iteration in Python


I want to go through 50 groups (site sections) with 100,000 page IDs in each, but Selenium is getting slower with each iteration. What could be the problem?

chrome_driver_way = './chromedriver/chromedriver.exe'
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-extensions")
options.add_argument("--disable-infobars")
options.add_argument("--log-level=2")
options.add_argument('--ignore-ssl-errors')
options.add_argument('--ignore-certificate-errors')

for group in groups:
    texts = [None] * len(ids)
    with webdriver.Chrome(
            executable_path=chrome_driver_way,
            options=options
    ) as browser:
        start = time.time()
        for n, id in enumerate(ids):
            browser.get(url.format(id))
            text = browser.find_elements(
                By.XPATH, '//div/div/div'
            )
            texts[n] = text[0].text
            if n != 0 and n % 100 == 0:
                print('group {}: IDs {}-{} processed in {} s'.format(group, n - 100, n, time.time() - start))
                start = time.time()
            del text

Logs:

group 1: IDs 100-200 processed in 8.312201976776123 s 
group 1: IDs 1100-1200 processed in 9.060782194137573 s 
group 1: IDs 2100-2200 processed in 11.111422777175903 s 
... 
group 1: IDs 14100-14200 processed in 36.37690353393555 s

Versions:

Python 3.10
Selenium 4.1.0
Chrome 133.0.6943.142

Solution

  • Thanks to @JeffC and @steve-ed: the problem was with Chrome tabs. I added tab re-creation every 100 iterations:

        if n != 0 and n % 100 == 0:
            print('group {}: IDs {}-{} processed in {} s'.format(group, n - 100, n, time.time() - start))
            driver.switch_to.new_window('tab')
            curr = driver.current_window_handle
            for handle in driver.window_handles:
                driver.switch_to.window(handle)
                if handle != curr:
                    driver.close()
            start = time.time()
    

    And now it gives a stable result:

    group 1: IDs 100-200 processed in 11.281310081481934 s 
    group 1: IDs 1100-1200 processed in 9.197898626327515 s 
    group 1: IDs 2100-2200 processed in 10.045527458190918 s 
    ... 
    group 1: IDs 5100-5200 processed in 9.298804521560669 s
    ...
    group 1: IDs 14100-14200 processed in 9.699942350387573 s