I'm trying to scrape a table from a webpage, but the table is dynamically loaded via JavaScript and appears 5-7 seconds after page load when viewed manually.
However, when using a web scraper, the table does not load at all or times out. I’ve tried multiple approaches (Playwright, Selenium, BeautifulSoup, and Scrapy), but none seem to work.
What I’ve Tried:
Waiting longer for JavaScript to render the table
Increased timeout (wait_for_selector in Playwright, WebDriverWait in Selenium). Added sleep() before scraping. Ensuring the correct selector
The table exists inside:
<div class="col-sm-12">
<table id="data-table" class="table ...">...</table>
</div>
I verified div.col-sm-12 table#data-table
in Chrome DevTools, and it matches the actual table.
Trying different scraping tools
Playwright: wait_for_selector() times out even after 20+ seconds. Selenium: WebDriverWait still doesn't detect the table. BeautifulSoup: requests.get() only returns the initial page source without the table. Scrapy: Table is missing from the response.
Triggering JavaScript events
Tried scrolling down (execute_script("window.scrollTo(0, document.body.scrollHeight)")). Simulated clicks to see if the table needs interaction. Manually checked for AJAX requests in Network tab, but didn’t find any obvious API calls.
Issue:
The table is not present in the initial HTML response. JavaScript takes 5-7 seconds to load it, but scrapers don’t seem to detect it. Works fine if I manually copy-paste the table’s HTML and parse it with Pandas. My Code (Using Playwright as an Example)
(But I’m open to Selenium, Scrapy, or other suggestions!)
import asyncio
from playwright.async_api import async_playwright
import pandas as pd
BASE_URL = "https://www.chimwiini.com/p/chimwiini-dictionary.html"
async def scrape_page(page):
"""Scrape the page and return table data."""
try:
await page.goto(BASE_URL, wait_until="load")
await asyncio.sleep(10) # Giving extra time for JavaScript to load
# Try waiting for the table
await page.wait_for_selector("div.col-sm-12 table#data-table", timeout=20000)
# Extract table HTML
table_html = await page.inner_html("div.col-sm-12 table#data-table")
tables = pd.read_html(f"<table>{table_html}</table>") # Parse with pandas
return tables[0] if tables else None
except Exception as e:
print(f"Error: {e}")
return None
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
data = await scrape_page(page)
await browser.close()
if data is not None:
print(data.head()) # Print sample data
else:
print("No data scraped.")
asyncio.run(main())
Here is the error:
runfile('C:/Users/Gaming/anaconda3/Lib/site-packages/spyder_kernels/untitled0.py', wdir='C:/Users/Gaming/anaconda3/Lib/site-packages/spyder_kernels')
Scraping page 1...
Error scraping page 1: Page.wait_for_selector: Timeout 20000ms exceeded.
Call log:
- waiting for locator("div.col-sm-12 table#data-table") to be visible
My Questions:
I’m open to Playwright, Selenium, Scrapy, or any other approach that works.
Any help is greatly appreciated.
I was able to get it done with just python/Selenium. A few things:
WebDriverWait
to wait for the desired TABLE to be visible to ensure that the table data load is completeI threw the data into a pandas DataFrame to make it print pretty. You can use it or remove it as you wish.
Here's the working code:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
url = "https://www.chimwiini.com/p/chimwiini-dictionary.html"
driver.get(url)
# wait and switch into each of the IFRAMEs so we can access the TABLE
wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "#main-wrapper iframe")))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "sandboxFrame")))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "userHtmlFrame")))
# grab the table headers
headings = []
for th in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#data-table thead th"))):
headings.append(th.text)
# grab each row
rows = []
for table_row in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#data-table tbody tr"))):
cells = []
for cell in table_row.find_elements(By.CSS_SELECTOR, "td"):
cells.append(cell.text)
rows.append(cells)
# create and print the DataFrame
df = pd.DataFrame(rows, columns=headings)
print(df)
driver.quit()
It prints
Chimwiini Word English Words Chimwiini Synonyms English Synonyms
0 Aakhiri Last
1 Aarani
2 Abaari
3 Abadi Constantly, Frequently All the time, Everytime,
4 Abbay Faatduma
5 Achaari Spicy condiment, Pickle masala, chutney
6 Adabu Polite, Manners
7 Adadi Amount
8 Aduwi Enemy
9 Afisha Forgive