I want to go through 50 groups (site sections) with 100,000 page IDs in each, but Selenium is getting slower with each iteration. What could be the problem?
chrome_driver_way = './chromedriver/chromedriver.exe'
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-extensions")
options.add_argument("--disable-infobars")
options.add_argument("--log-level=2")
options.add_argument('--ignore-ssl-errors')
options.add_argument('--ignore-certificate-errors')
for group in groups:
texts = [None] * len(ids)
with webdriver.Chrome(
executable_path=chrome_driver_way,
options=options
) as browser:
start = time.time()
for n, id in enumerate(ids):
browser.get(url.format(id))
text = browser.find_elements(
By.XPATH, '//div/div/div'
)
texts[n] = text[0].text
if n != 0 and n % 100 == 0:
print('group {}: IDs {}-{} processed in {} s'.format(group, n - 100, n, time.time() - start))
start = time.time()
del text
Logs:
group 1: IDs 100-200 processed in 8.312201976776123 s
group 1: IDs 1100-1200 processed in 9.060782194137573 s
group 1: IDs 2100-2200 processed in 11.111422777175903 s
...
group 1: IDs 14100-14200 processed in 36.37690353393555 s
Versions:
Python 3.10
Selenium 4.1.0
Chrome 133.0.6943.142
Thanks to @JeffC and @steve-ed: the problem was with Chrome tabs. I added tab re-creation every 100 iterations:
if n != 0 and n % 100 == 0:
print('group {}: IDs {}-{} processed in {} s'.format(group, n - 100, n, time.time() - start))
driver.switch_to.new_window('tab')
curr = driver.current_window_handle
for handle in driver.window_handles:
driver.switch_to.window(handle)
if handle != curr:
driver.close()
start = time.time()
And now it gives a stable result:
group 1: IDs 100-200 processed in 11.281310081481934 s
group 1: IDs 1100-1200 processed in 9.197898626327515 s
group 1: IDs 2100-2200 processed in 10.045527458190918 s
...
group 1: IDs 5100-5200 processed in 9.298804521560669 s
...
group 1: IDs 14100-14200 processed in 9.699942350387573 s