I am trying to scrape data from this website https://data.anbima.com.br/debentures/AALM11/agenda?page=1&size=100& and when I look at the DevTools > Elements, it has a TABLE tag with the data inside TR and TD tags (dates, values, etc.), but when I try to parse the HTML with Selenium or bs4 the data disappear and instead I see a <div class="skeleton-container" aria-hidden="true">. What can I do to extract the information I need?
My code
deb = 'AALM11'
link_agenda = 'https://data.anbima.com.br/debentures/' + deb + '/agenda?page=1&size=100'
driver.get(link_agenda)
html_source = driver.find_element(By.TAG_NAME, 'table').get_attribute('outerHTML')
The result
<table id="" class="anbima-ui-table anbima-ui-table-responsive anbima-ui-table-mobile">
<thead>
<tr>
<th><span style="width: 80px;"><div class="skeleton-container" aria-hidden="true" style="width: 80px; height: 18px; margin-top: 0px;"></div></span></th>
<th><span style="width: 110px;"><div class="skeleton-container" aria-hidden="true" style="width: 100px; height: 18px; margin-top: 0px;"></div></span></th>
<th><span style="width: 110px;"><div class="skeleton-container" aria-hidden="true" style="width: 45px; height: 18px; margin-top: 0px;"></div></span></th>
<th><span style="width: 110px;"><div class="skeleton-container" aria-hidden="true" style="width: 90px; height: 18px; margin-top: 0px;"></div></span></th>
<th><span style="width: 110px;"><div class="skeleton-container" aria-hidden="true" style="width: 55px; height: 18px; margin-top: 0px;"></div></span></th>
<th><span style="width: 80px;"><div class="skeleton-container" aria-hidden="true" style="width: 45px; height: 18px; margin-top: 0px;"></div></span></th>
</tr>
</thead>
<tbody>
<tr>
<td><span><div class="skeleton-container" aria-hidden="true" style="width: 75px; height: 18px; margin-top: 0px;"></div></span></td>
<td><span><div class="skeleton-container" aria-hidden="true" style="width: 75px; height: 18px; margin-top: 0px;"></div></span></td>
<td><span><div class="skeleton-container" aria-hidden="true" style="width: 125px; height: 18px; margin-top: 0px;"></div></span></td>
<td><span><div class="skeleton-container" aria-hidden="true" style="width: 75px; height: 18px; margin-top: 0px;"></div></span></td>
<td><span><div class="skeleton-container" aria-hidden="true" style="width: 100px; height: 18px; margin-top: 0px;"></div></span></td>
<td><span><div class="skeleton-container" aria-hidden="true" style="width: 100px; height: 18px; margin-top: 0px;"></div></span></td>
</tr>
...
I was expecting to see this instead
<table id="" class="anbima-ui-table anbima-ui-table-responsive agenda-ativo-page__table--liquidado-1 agenda-ativo-page__table--liquidado-2 agenda-ativo-page__table--liquidado-3 agenda-ativo-page__table--liquidado-4 agenda-ativo-page__table--liquidado-5 agenda-ativo-page__table--liquidado-6 agenda-ativo-page__table--liquidado-7 agenda-ativo-page__table--liquidado-8 agenda-ativo-page__table--liquidado-9 agenda-ativo-page__table--liquidado-10 ">
<thead>
<tr>
<th><span style="width: 80px;">Data do evento</span></th>
<th><span style="width: 110px;">Data de liquidação</span></th>
<th><span style="width: 110px;">Evento</span></th>
<th><span style="width: 110px;">Percentual / Taxa</span></th>
<th><span style="width: 110px;">Valor pago</span></th>
<th><span style="width: 80px;">Status</span></th>
</tr>
</thead>
<tbody>
<tr>
<td><span id="agenda-data-evento-0" class="normal-text">13/01/2022</span></td>
<td><span id="agenda-data-liquidacao-0" class="normal-text">13/01/2022</span></td>
<td><span id="agenda-evento-0" class="normal-text">Pagamento de juros</span></td>
<td><span id="agenda-taxa-0" class="normal-text">4,3500 %</span></td>
<td><span id="agenda-valor-0" class="normal-text">R$ 53,434259</span></td>
<td><span id="agenda-status-0" class="anbima-ui-flag anbima-ui-flag--small anbima-ui-flag--small--green " style="max-width: 96px;"><label class="flag__children">Liquidado</label></span></td>
</tr>
...
The problem is that the table data is dynamically loaded. When the browser is loading the page, it signals to Selenium that the page is done loading but the content of the page is still loading in the background. So your code is executed and it scrapes the partially loaded page. To fix this, we need to wait for something that indicates that the page is done loading. I chose to wait for the absence of all the <div class="skeleton-container" ...>
elements. Once those are gone, the table data load is complete and the table data is available.
Working code...
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.maximize_window()
deb = 'AALM11'
link_agenda = 'https://data.anbima.com.br/debentures/' + deb + '/agenda?page=1&size=100'
driver.get(link_agenda)
wait = WebDriverWait(driver, 10)
wait.until(EC.invisibility_of_element_located((By.CSS_SELECTOR, "div.skeleton-container")))
table = driver.find_element(By.CSS_SELECTOR, "table")
print(table.get_attribute('outerHTML'))