pythonselenium-webdriverweb-scrapingautomationpagination

How to scrape a website that has <span class="ellipsis">…</span> in between number on a dynamic table with sellenium python


I am trying to scrape dividend data for the stock "Vale" on the site https://investidor10.com.br/acoes/vale3/. The dividend table has 8 buttons (1, 2, 3, ..., 8) and "Next" and "Previous" buttons as well. My script can scrape data from the first 5 tables, but when clicking the button with idx="5", it jumps to idx="8", causing it to miss the data from the 6th, 7th, and 8th tables.

Despite trying everything I found on YouTube, Reddit, and Google, I still cannot fix the issue.

Here is the code I am using:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep

def iterar_botao():
    botoes = driver.find_elements(By.CSS_SELECTOR, "a[data-dt-idx]")
    qtd_botoes = len(botoes)
    
    for i in range(qtd_botoes):
        clicar_botao(str(i+1))

def clicar_botao(idx):
    try:
        localizador = (By.CSS_SELECTOR, f'a[data-dt-idx="{idx}"]')
        botao = WebDriverWait(driver, 10).until(EC.presence_of_element_located(localizador))
        
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        sleep(1)
        driver.execute_script("arguments[0].scrollIntoView({behavior:'instant', block:'center' });", botao)
        driver.execute_script("arguments[0].click();", botao)
        
        WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "table-dividends-history")))
        pegar_tabelas()  # Function to scrape the tables (not shown here)
    except Exception as e:
        print(f"Failed to execute function: {e}")
Failed to execute function: Message: RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8
WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:199:5
NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:552:5

I tried adding waits and sleeps to ensure elements are properly loaded.

I've tried debugging by printing the button idx values before clicking.

I checked if the NoSuchElementError was caused by a wrong element locator, but the button exists on the page.


Solution

  • Here's how you can get the desired results with the following steps:

    1. First, navigates to the target page, that's https://investidor10.com.br/acoes/vale3/

    2. Wait for the dividends section to be present

      • uses an explicit wait for #dividends-section to ensure the dividends area is rendered before proceeding.
    3. Next, locate the table wrapper and bring it into view

    4. Then capture the table element and headers

    5. Extract Page 1 data

      • Calls processing(table) to scrape all visible rows on the first page into row_list.
    6. Finally, Paginate through the table

      • Inside the loop:

        • Tries to locate the next pagination button with CSS: #table-dividends-history_paginate > a.paginate_button.next

        • Attempts to click it. If an ElementClickInterceptedException occurs (e.g., overlay or animation), it silently retries on the next iteration.

        • After a successful click, waits 1 second for the next table to load, and calls processing(table) again to append the new rows.

        • If the next button is not found (NoSuchElementException), stops the loop, meaning the last page has been processed.

    7. Assemble the final table:

      • Builds a pandas.DataFrame from row_list using table_header_list as the column names.

      • and prints the resulting table, which now contains the concatenated rows from all paginated pages.

    And here's the implementation code:

    import time
    import pandas as pd
    from selenium.webdriver import Chrome, ChromeOptions
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
    
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    
    row_list = []
    
    
    def processing(tbl):
        table_rows = tbl.find_elements(By.CSS_SELECTOR, "div.dataTables_scrollBody>table>tbody>tr")
        for row in table_rows:
            row_list.append([d.text for d in row.find_elements(By.TAG_NAME, 'td')])
    
    
    options = ChromeOptions()
    options.add_argument("--start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option("useAutomationExtension", False)
    
    driver = Chrome(options=options)
    wait = WebDriverWait(driver, 10)
    
    url = "https://investidor10.com.br/acoes/vale3/"
    driver.get(url)
    
    wait.until(EC.visibility_of_element_located((By.ID, "dividends-section")))
    
    dividend_table_container = driver.find_element(By.ID, "table-dividends-history_wrapper")
    driver.execute_script("arguments[0].scrollIntoView(true);", dividend_table_container)
    table = dividend_table_container.find_element(By.CSS_SELECTOR, "div.dataTables_scroll")
    table_header_list = table.find_element(By.CSS_SELECTOR, "div.dataTables_scrollHead").text.split('\n')
    print(f"Table Header {table_header_list}")
    print(f"Extracting Page 1...")
    
    processing(table)
    page_num = 2
    NEXT_PAGE_AVAILABLE = True
    while NEXT_PAGE_AVAILABLE:
        try:
            next_page = dividend_table_container.find_element(By.CSS_SELECTOR, '#table-dividends-history_paginate>a[class="paginate_button next"]')
            try:
                next_page.click()
                time.sleep(1)
                print(f"Extracting Page {page_num}...")
                processing(table)
                page_num += 1
            except ElementClickInterceptedException:
                pass
        except NoSuchElementException:
            print("Reached End Page")
            NEXT_PAGE_AVAILABLE = False
    
    
    # show the table
    df = pd.DataFrame(row_list, columns=table_header_list)
    print(df)
    

    output:

    Table Header ['TIPO', 'DATA COM', 'PAGAMENTO', 'VALOR']
    Extracting Page 1...
    Extracting Page 2...
    Extracting Page 3...
    Extracting Page 4...
    Extracting Page 5...
    Extracting Page 6...
    Extracting Page 7...
    Extracting Page 8...
    Reached End Page
    
               TIPO    DATA COM   PAGAMENTO       VALOR
    0          JSCP  12/08/2025  03/09/2025  1,89538700
    1    Dividendos  07/03/2025  14/03/2025  2,14184748
    2          JSCP  11/12/2024  14/03/2025  0,52053100
    3          JSCP  02/08/2024  04/09/2024  2,09379814
    4    Dividendos  11/03/2024  19/03/2024  2,73854837
    5    Dividendos  21/11/2023  01/12/2023  1,56589100
    6          JSCP  21/11/2023  01/12/2023  0,76577076
    7          JSCP  11/08/2023  01/09/2023  1,91847180
    8    Dividendos  13/03/2023  22/03/2023  1,82764600
    9          JSCP  12/12/2022  22/03/2023  0,29201200
    10         JSCP  11/08/2022  01/09/2022  1,53937600
    11   Dividendos  11/08/2022  01/09/2022  2,03268000
    12   Dividendos  08/03/2022  16/03/2022  3,71925600
    13   Dividendos  22/09/2021  30/09/2021  8,19723900
    14   Dividendos  23/06/2021  30/06/2021  1,47340202
    15   Dividendos  23/06/2021  30/06/2021  0,71626805
    16         JSCP  04/03/2021  15/03/2021  0,83573600
    17   Dividendos  04/03/2021  15/03/2021  3,42591000
    18         JSCP  21/09/2020  30/09/2020  0,99734400
    19   Dividendos  21/09/2020  30/09/2020  1,41016600
    20         JSCP  26/12/2019  26/12/2019  1,41436400
    21         JSCP  02/08/2018  20/09/2018  1,30861400
    22   Dividendos  02/08/2018  20/09/2018  0,17174700
    23         JSCP  06/03/2018  15/03/2018  0,48851100
    24         JSCP  21/12/2017  15/03/2018  0,41991200
    25         JSCP  20/04/2017  28/04/2017  0,90557100
    26         JSCP  01/12/2016  16/12/2016  0,16630000
    27   Dividendos  15/10/2015  30/10/2015  0,37360000
    28         JSCP  14/04/2015  30/04/2015  0,60180000
    29         JSCP  16/10/2014  31/10/2014  0,65080000
    30   Dividendos  16/10/2014  31/10/2014  0,34000000
    31         JSCP  14/04/2014  30/04/2014  0,89890000
    32   Dividendos  17/10/2013  31/10/2013  0,12060000
    33         JSCP  17/10/2013  31/10/2013  0,82370000
    34   Dividendos  16/04/2013  30/04/2013  0,15360000
    35         JSCP  16/04/2013  30/04/2013  0,71040000
    36         JSCP  16/10/2012  31/10/2012  0,52590000
    37   Dividendos  16/10/2012  31/10/2012  0,66070000
    38         JSCP  13/04/2012  30/04/2012  1,07530000
    39         JSCP  14/10/2011  31/10/2011  0,63430000
    40   Dividendos  14/10/2011  31/10/2011  0,38930000
    41   Dividendos  11/08/2011  26/08/2011  0,93340000
    42         JSCP  13/04/2011  29/04/2011  0,60820000
    43         JSCP  14/01/2011  31/01/2011  0,32000000
    44         JSCP  14/10/2010  29/10/2010  0,55520000
    45         JSCP  14/04/2010  30/04/2010  0,42170000
    46         JSCP  15/10/2009  30/10/2009  0,49200000
    47   Dividendos  15/04/2009  30/04/2009  0,52460000
    48   Dividendos  16/10/2008  31/10/2008  0,13850000
    49         JSCP  16/10/2008  31/10/2008  0,51470000
    50         JSCP  10/04/2008  30/04/2008  0,23810000
    51   Dividendos  10/04/2008  30/04/2008  0,19850000
    52         JSCP  18/10/2007  31/10/2007  0,38190000
    53   Dividendos  18/10/2007  31/10/2007  0,01220000
    54         JSCP  17/04/2007  30/04/2007  0,25730000
    71         JSCP  28/12/1999  01/03/2000  1,17000000
    72         JSCP  06/08/1999  20/08/1999  1,11000000
    73  Bonificação  18/04/1997  18/04/1997  1,00000000