javascriptnode.jsweb-scrapingpuppeteer

Puppeteer error Error: waiting on selector times out


Currently I have a site that has this in its HTML. I confirmed it from checking the elements in chrome developer tools.

<div class="hdp-photo-carousel" style="transform: translateX(0px);">
  <div class="photo-tile photo-tile-large">

I visually watch the page open up and I can see the item is there. Then I get this error after 30 seconds:

UnhandledPromiseRejectionWarning: TimeoutError: waiting for selector ".photo-tile" failed: timeout 30000ms exceeded

My code in puppeteer js for this is:

const pptrFirefox = require('puppeteer-firefox');

(async () => {
  const browser = await pptrFirefox.launch({headless: false});
  const page = await browser.newPage();
  await page.goto('https://zillow.com');
  await page.type('.react-autosuggest__input', '8002 Blandwood Rd. Downey, CA 90240');
  await page.click('.zsg-search-button_primary');
  await page.waitForSelector('.photo-tile');
  console.log('did I get this far?');
})();

Can anyone tell me what I'm doing wrong?


Solution

  • The site has changed in the 4 years since this has been asked, but it's a common story: an element is hand-verified to exist in dev tools and the selector is copied to Puppeteer but there's a timeout when waiting for it.

    There are at least a few common reasons for this:

    One debugging strategy is to run headfully (OP is already doing this, but future visitors may not be). If the code works, then the site is only detecting you as a bot when you're headless. See the canonical Why does headless need to be false for Puppeteer to work? for next steps. console.log(await page.content()) can help establish whether you're being blocked headlessly.

    If running headfully still doesn't work, look at the page to see why. In some cases, the page may show a captcha, leading to Bypassing CAPTCHAs with Headless Chrome using puppeteer. This appears to be the case in the current question at the time of writing.

    Typically, adding more waitForNavigations and setting timeouts to 0 doesn't help (unless you're navigating between pages with a click or form submission, then waitForNavigation may be appropriate).

    Disclosure: I'm the author of the linked blog post.