javascriptpuppeteer

Puppeteer case insensitive text locator with variable from Node scope


I'm attempting to locate and element with the innerText that is defined in a variable. I've reviewed this question Puppeteer: search for inner text case insensitive, but it looks like it only works for an older version of puppeteer.

I'm using "puppeteer": "^24.4.0".

This works

const puppeteer = require('puppeteer')

;(async () => {
  const browser = await puppeteer.launch({ headless: true })
  const page = await browser.newPage()
  await page.goto('https://www.google.com', { waitUntil: 'domcontentloaded' })

  // Approach 1: toLowerCase with hardcoded text
  try {
    const aboutButton = await page
      .locator('a')
      .setTimeout(10000) // Increased timeout
      .filter((el) => el.innerText.toLowerCase() === 'About'.toLowerCase())
      .waitHandle()
    console.log(
      'About (toLowerCase):',
      await aboutButton.evaluate((el) => el.href)
    )
  } catch (error) {
    console.log('About (toLowerCase) error:', error.message)
  }
  await browser.close()
})()

It fails if I place the text in a variable.

const puppeteer = require('puppeteer')

;(async () => {
  const browser = await puppeteer.launch({ headless: true })
  const page = await browser.newPage()
  await page.goto('https://www.google.com', { waitUntil: 'domcontentloaded' })

  // Approach 2: variable with hardcoded text
  const aboutText = 'About'
  try {
    const aboutButton = await page
      .locator('a')
      .setTimeout(5000)
      .filter((el) => el.innerText.toLowerCase() === aboutText.toLowerCase())
      .waitHandle()
    console.log(
      'About (toLowerCase):',
      await aboutButton.evaluate((el) => el.href)
    )
  } catch (error) {
    console.log('About (toLowerCase) error:', error.message)
  }

  await browser.close()
})()

I've also tried the solutions from the link question, which has the same issue.

This works:

const aboutButton = await page.evaluateHandle(() =>
  [...document.querySelectorAll('a')].find((s) =>
    s.innerText.toLowerCase().match('About'.toLowerCase())
  )
)

This fails:

const aboutText4 = 'About'
const aboutButton = await page.evaluateHandle(() =>
  [...document.querySelectorAll('a')].find((s) =>
    s.innerText.toLowerCase().match(aboutText4.toLowerCase())
  )
)

I see this error message form the final result. I clearly don't understand how the scope works here and I'm not sure how to make this work off a variable.

About (toLowerCase) error: aboutText4 is not defined


Solution

  • The root issue, which I think you understand, is that these functions don't run in Node context, they run in the browser context--the page being automated. So all scope from Node is unavailable.

    One option is to pass a string in to .filter() instead of a function:

    const puppeteer = require("puppeteer"); // ^24.4.0
    
    const html = `<!DOCTYPE html><html><body>
    <script>
    setTimeout(() => {
      document.body.innerHTML = "<a>About</a>";
    }, 3000);
    </script>
    </body></html>`;
    
    let browser;
    (async () => {
      browser = await puppeteer.launch();
      const [page] = await browser.pages();
      await page.setContent(html);
      const target = "about";
      const el = await page
        .locator("a")
        .filter(`el => el.textContent.toLowerCase() === "${target}"`)
        .waitHandle();
      console.log(await el.evaluate(el => el.textContent)); // => About
    })()
      .catch(err => console.error(err))
      .finally(() => browser?.close());
    

    (google.com has robot blocking, so using a hardcoded HTML string is more reproducible)

    Alternately, you can use waitForFunction:

    const el = await page.waitForFunction(
      target =>
        [...document.querySelectorAll("a")].find(
          el => el.textContent.toLowerCase() === target
        ),
      {},
      target
    );
    

    Or, yet again, a string, since you have a simple one-line function which is amenable to this:

    const el = await page.waitForFunction(`
      [...document.querySelectorAll("a")].find(
        el => el.textContent.toLowerCase() === "${target}"
      )
    `);
    

    evaluateHandle doesn't auto-wait and is discouraged, but just as a proof of concept, here's how you can use it:

    const el = await page.evaluateHandle(
      (target) =>
        [...document.querySelectorAll("a")].find(el =>
          el.textContent.toLowerCase() === target.toLowerCase() // not a regex
        ),
      aboutText4 // <-- use the second argument to pass data into the browser
    );
    

    I would also call .trim() after .toLowerCase().

    See also How do you click on an element with text in Puppeteer?.