I have some JavaScript code running in node.js which controls puppeteer to automate tasks in a web browser.
This code gets a list of links on the page and outputs them to the console:
const links = await page.evaluate(() => { return [...document.querySelectorAll('a')].map(({href, innerText}) => ({href, innerText})); });
links.forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));
If I remove the map()
like this:
const links = await page.evaluate(() => { return [...document.querySelectorAll('a')]; });
links.forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));
Then I get this error:
TypeError: Cannot read properties of undefined (reading 'trim')
Is there any way to work directly on the original array, without having to make a copy of the array using map()
?
There are a couple of hundred properties on each <a>
link, which I'd have to type out one at a time in the map()
if I wanted to use many of them.
As an aside, is there any way to combine the 2 lines of code in to 1?
If I change it to this:
await page.evaluate(() => { return [...document.querySelectorAll('a')].map(({href, innerText}) => ({href, innerText})); })
.forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));
Then I get this error:
TypeError: page.evaluate(...).forEach is not a function
I also found that it doesn't seem to be possible to do a console.log()
whilst inside a page.evaluate()
(I get no output). This is why I moved the forEach
on to a 2nd line.
IT goldman has correctly identified why trying to return an array of Nodes won't work--HTML elements aren't serializable.
It's possible to remove the map
and operate on the original objects, but it will result in worse code. Mutating the original array of nodes to make them serializable isn't a good idea since it's risk to modify objects you don't own.
Avoid premature optimization. It's OK to copy by default and only switch to in-place modification once you encounter a bottleneck and have profiled and determined that in-place modification really does account for the performance issue--highly unlikely.
As far as the Puppeteer API goes, you can immediately simplify
await page.evaluate(() =>
[...document.querySelectorAll("a")].map(...)
);
to
await page.$$eval("a", els => els.map(...));
The parameter els
passed to the callback is a regular array, so .map
is available without a spread.
I also found that it doesn't seem to be possible to do a console.log() whilst inside a page.evaluate() (I get no output). This is why I moved the forEach on to a 2nd line.
By default, the browser console output goes to your browser, not Node, because that's the environment the evaluate
callback runs in.
You can forward the browser console to Node, but whether that's appropriate or not is unclear. You haven't provided much context for what you're doing here, or why you're mapping links back to formatted links with stripped attributes (you might want to use .outerHTML
instead, depending on what you're actually trying to achieve).
I'd avoid smushing multiple lines onto one. Let two lines be two lines (or more)--just write clear code and use an autoformatter. await
is not amenable to chaining or one-liners (by design!), so I'd avoid the (await foo()).property
antipattern in favor of two lines.
Consider
const links = await page.$$eval("a", els =>
els.map(a => `<a href="${a.href}">${a.textContent.trim()}</a>`)
);
links.forEach(console.log);
or
const links = await page.$$eval("a", els => els.map(el => el.outerHTML));
links.forEach(console.log);
Generally, prefer .textContent
to .innerText
.
Note also that it's possible for a
links to not have href
s, so you might want to adjust your selector to a[href]
.