I'm developing a crawler for a service that is built as a Single Page Application (SPA).
I am unsure whether the framework used is React, Angular, or something else. My crawler interacts with a table that is displayed after clicking a button with the ID queries-process-button
. The table has a class i3WFpf
.
When I first crawl the table, everything works perfectly, and the data is correctly retrieved. However, when I navigate to another menu and try to crawl the new table data, my crawler still returns the data from the initial table, even though the table content has visually updated on the page.
Interestingly, when I reload the entire page and then crawl the new table, the correct data is retrieved. But again, the problem persists when I attempt to crawl any subsequent table.
Here’s a snippet of the code I'm using in the browser's console:
const table = document.querySelector('table.i3WFpf');
if (table) {
const rows = table.rows;
for (let i = 0; i < rows.length; i++) {
let cells = rows[i].cells;
let rowData = [];
for (let j = 0; j < cells.length; j++) {
rowData.push(cells[j].innerText);
}
console.log(`Row ${i + 1}:`, rowData);
}
} else {
console.log("Table with class 'i3WFpf' not found.");
}
Now how can I modify my crawler to correctly retrieve the updated table data without needing to reload the page?
Is there a way to ensure that the crawler always fetches the latest data displayed in the table, particularly in the context of SPAs?
const table = document.querySelector('table.i3WFpf');
querySelector
will only return the first element that matches the selector.
If the site you want to scrape loads additional data into a table that matches that same selector, and only hides the original table, but keeps it in the DOM - then you need to switch to using querySelectorAll
, and then pick the last (presumably?) matching element it finds.