node.jspuppeteergoogle-chrome-headlessbrowser-automation

Puppeteer parse dom before load event


In my puppeteer script, I am trying to open a website which takes 4-5 min. for page load event to complete. Thats because the ads(particularly video ads) on page take very long time. So here is my code snippet:

  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();

  await page.goto(link, {timeout: 0});
  // wait until anchor with class=author is visible
  await page.waitForSelector('a[class="author"]', {timeout: 0});

In above code, the default is 'waitUntil': 'load' as per puppeteer documentation so the line await page.waitForSelector(...) takes 4-5min. to reach. However the selector element I am looking for gets visible immediately as its not loaded via any script. So how do I get rid of this long delay due to having to wait for page load to finish?


Solution

  • You can use the domcontentloaded instead of load.

    await page.goto(link, {waitUntil: 'domcontentloaded'});
    

    From the MDN docs about DOMContentLoaded,

    The DOMContentLoaded event is fired when the document has been completely loaded and parsed, without waiting for stylesheets, images, and subframes to finish loading (the load event can be used to detect a fully-loaded page).