I am scraping a dynamic web which applied JS and react function on it (a blockchain explorer). I attempted to build a program which supposed to be able to scrape with JS running. However, it return me with result null for the element I am looking for. When I double check the element in source code via Chrome, the element exist with value. Why does that happened? What could I do to solve the issue? (By the way, I tried to find the element by class name, css selector, and Xpath as well, but still, not working)
for your information, class name=tag-item address-tag info-tag tag-item-fit-content css selector=.tag-item; Xpath=/html/body/div[1]/div[1]/div[2]/main/div[1]/div/div[1]/div[1]/div[1]/div[2]/div[1]/div
I am looking for the address label. As you may have see there, the element exist text content USDT Token. enter image description here. API is not working for labels obtaining for some kind of address in this website, that why I am using this method.
When I scrape it, it returns me null.
const puppeteer = require('puppeteer');
const fs = require('fs').promises;
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://tronscan.org/#/address/TAzsQ9Gx8eqFNFSKbeXrbi45CuVPHzA8wr',{
waitUntil: 'networkidle0',
});
const cookies = await page.cookies()
console.log(cookies);
const f = await page.$$('.tag-item')
const text = await (await f.getProperty('textContent')).jsonValue()
console.log("label is: " + text)
const html = await page.content();
await fs.writeFile('reactstorefront.html', html);
await browser.close();
})();
Return: [] const text = await (await f.getProperty('textContent')).jsonValue() ^
TypeError: f.getProperty is not a function
Node.js v20.15.0
After checking, the reason of this error may probably because the element is null.
I have no idea why the value is null. I wonder if the reason be the cookie or react but the code above suppose to solve both of the problem already. I could not check where is the problem now since it turns to error now and would not output an updated reactstorefront.html
page.$
and page.$$
are instantaneous selections, so if the element is added to the page asynchronously, these methods won't wait to pick it up. Use a locator or waitForSelector
:
const puppeteer = require("puppeteer"); // ^22.10.0
const url = "<Your URL>";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const ua =
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36";
await page.setUserAgent(ua);
await page.goto(url, {waitUntil: "domcontentloaded"});
const item = await page.waitForSelector(".tag-item");
const text = await item.evaluate(el => el.textContent);
console.log(`label is: ${text}`); // => label is: Binance-Hot 5
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
This particular website loads very slowly, so you might want to try methodically blocking scripts that aren't necessary for retrieving the one piece of data you're after.
See Why does headless need to be false for Puppeteer to work? for an explanation of why the user agent header was added.