pythonselenium-webdriververtical-scrolling

How can I scroll to the bottom of a page with Selenium?


I'm trying to use this code to scroll down to the end of a page:

from selenium import webdriver

url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

for index in range(len(tickers)):
    print("Row " + tickers[index].text + " ") 

But the while loop never ends; Selenium continues to try and scroll downward even after it hits the bottom of the page, so the program doesn't proceed. How can I detect that the bottom of the page has been reached so that the code can continue?


Solution

  • Under the ticker, it tells you how many rows (matches) are in the table. So, one option is to compare the number of visible rows to the total number of rows. When you reach that number (of visible rows), you quit the loop.

    url = 'http://www.tradingview.com/screener'
    driver = webdriver.Firefox()
    driver.get(url)
    
    # Get scroll height
    last_height = driver.execute_script("return document.body.scrollHeight")
    
    selector = '.js-field-total.tv-screener-table__field-value--total'
    matches = driver.find_element_by_css_selector(selector)
    matches = int(matches.text.split()[0])
    
    visible_rows = 0
    scrolls = 0
    
    while visible_rows < matches:
    
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    
        # Wait 10 scrolls before updating row information 
        if scrolls == 10:
            table = driver.find_elements_by_class_name('tv-data-table__tbody')
            visible_rows = len(table[1].find_elements_by_tag_name('tr'))
            scrolls = 0
    
        scrolls += 1
    
    # will give a list of all tickers
    tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 
    
    for index in range(len(tickers)):
       print("Row " + tickers[index].text + " ") 
    

    Edit: Since your setup doesn't seem to allow the previous solution, here's a different approach you can try. The page loads 150 rows at a time. So, instead of counting the number of visible rows, we can use the total matches/rows we're expecting (e.g. 4894) and divide that by 150 to get the number of times we need to scroll. If we scroll at least that many times, in theory, all of the rows should be visible and we can continue with the code.

    from time import sleep
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    
    url = 'http://www.tradingview.com/screener'
    driver = webdriver.Chrome('./chromedriver')
    driver.get(url)
    
    try:
    
        selector = '.js-field-total.tv-screener-table__field-value--total'
        condition = EC.visibility_of_element_located((By.CSS_SELECTOR, selector))
        matches = WebDriverWait(driver, 10).until(condition)
        matches = int(matches.text.split()[0])
    
    except (TimeoutException, Exception):
        print ('Problem finding matches, setting default...')
        matches = 4895 # Set default
    
    # The page loads 150 rows at a time; divide matches by
    # 150 to determine the number of times we need to scroll;
    # add 5 extra scrolls just to be sure
    num_loops = int(matches / 150 + 5)
    
    for _ in range(num_loops):
    
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        sleep(2) # Pause briefly to allow loading time
    
    # will give a list of all tickers
    tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 
    
    n_tickers = len(tickers)
    
    msg = 'Correct ' if n_tickers == matches else 'Incorrect '
    msg += 'number of tickers ({}) found'
    print(msg.format(n_tickers))
    
    for index in range(n_tickers):
        print("Row " + tickers[index].text + " ")