javascriptnode.jspuppeteer

Why is an Array.map() necessary, otherwise the var is undefined? / How to eliminate the Array.map()?


I have some JavaScript code running in which controls to automate tasks in a web browser.

This code gets a list of links on the page and outputs them to the console:

const links = await page.evaluate(() => { return [...document.querySelectorAll('a')].map(({href, innerText}) => ({href, innerText})); });
links.forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));

If I remove the map() like this:

const links = await page.evaluate(() => { return [...document.querySelectorAll('a')]; });
links.forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));

Then I get this error:

TypeError: Cannot read properties of undefined (reading 'trim')

Is there any way to work directly on the original array, without having to make a copy of the array using map()?

There are a couple of hundred properties on each <a> link, which I'd have to type out one at a time in the map() if I wanted to use many of them.


As an aside, is there any way to combine the 2 lines of code in to 1?

If I change it to this:

await page.evaluate(() => { return [...document.querySelectorAll('a')].map(({href, innerText}) => ({href, innerText})); })
    .forEach(a => console.log(`<a href="${a.href}">${a.innerText.trim()}</a>`));

Then I get this error:

TypeError: page.evaluate(...).forEach is not a function

I also found that it doesn't seem to be possible to do a console.log() whilst inside a page.evaluate() (I get no output). This is why I moved the forEach on to a 2nd line.


Solution

  • IT goldman has correctly identified why trying to return an array of Nodes won't work--HTML elements aren't serializable.

    It's possible to remove the map and operate on the original objects, but it will result in worse code. Mutating the original array of nodes to make them serializable isn't a good idea since it's risk to modify objects you don't own.

    Avoid premature optimization. It's OK to copy by default and only switch to in-place modification once you encounter a bottleneck and have profiled and determined that in-place modification really does account for the performance issue--highly unlikely.

    As far as the Puppeteer API goes, you can immediately simplify

    await page.evaluate(() =>
      [...document.querySelectorAll("a")].map(...)
    );
    

    to

    await page.$$eval("a", els => els.map(...));
    

    The parameter els passed to the callback is a regular array, so .map is available without a spread.

    I also found that it doesn't seem to be possible to do a console.log() whilst inside a page.evaluate() (I get no output). This is why I moved the forEach on to a 2nd line.

    By default, the browser console output goes to your browser, not Node, because that's the environment the evaluate callback runs in.

    You can forward the browser console to Node, but whether that's appropriate or not is unclear. You haven't provided much context for what you're doing here, or why you're mapping links back to formatted links with stripped attributes (you might want to use .outerHTML instead, depending on what you're actually trying to achieve).

    I'd avoid smushing multiple lines onto one. Let two lines be two lines (or more)--just write clear code and use an autoformatter. await is not amenable to chaining or one-liners (by design!), so I'd avoid the (await foo()).property antipattern in favor of two lines.

    Consider

    const links = await page.$$eval("a", els =>
      els.map(a => `<a href="${a.href}">${a.textContent.trim()}</a>`)
    );
    links.forEach(console.log);
    

    or

    const links = await page.$$eval("a", els => els.map(el => el.outerHTML));
    links.forEach(console.log);
    

    Generally, prefer .textContent to .innerText.

    Note also that it's possible for a links to not have hrefs, so you might want to adjust your selector to a[href].