javascriptnode.jsweb-scrapingxpathpuppeteer

XPath Selector in Puppeteer 22.x


I have read the newest Puppeteer v22.x documentation about XPath, still don't know how to use XPath in Puppeteer 22.x.

I want to click an element containing the text 'Next'. Here the HTML that has the 'Next' button:

<div class="bla bla bla" role="button" tabindex="0">Next</div>

Here the behavior that I've analyzed:

  1. The class value is not static. It will be randomly generated on every request or page refresh.
  2. I can't use the role="button" as the indicator for the button, because in that page there are many tags using role="button" in their attributes.

Here is what I've tried:

await page.waitForSelector("xpath/div[@role='button' and text()='Next']");
await page.waitForSelector("//div[@role='button' and text()='Next']");

Solution

  • When you're using XPath, you generally want to prefix the expression with // to dive deeply into the HTML tree:

    const puppeteer = require("puppeteer"); // ^22.7.1
    
    const html = `<div class="bla bla bla" role="button" tabindex="0">Next</div>`;
    
    let browser;
    (async () => {
      browser = await puppeteer.launch();
      const [page] = await browser.pages();
      await page.setContent(html);
      const el = await page.waitForSelector(
        "xpath///div[@role='button' and text()='Next']"
      );
      console.log(await el.evaluate(el => el.textContent)); // => Next
      await el.click();
    })()
      .catch(err => console.error(err))
      .finally(() => browser?.close());
    

    I'd also use normalize-space() rather than text() to handle the presence of whitespace.

    Better yet, use a p-selector along with CSS: '[role="button"]::-p-text(Next)' instead of XPath (although this would be a substring text match, which may not work for your use case).

    See also: