javascriptnode.jsasync-awaitpuppeteertor

Best Practice for retrying page.goto, page.waitForNavigation etc. in puppeteer/JavaScript


I'm trying to scrape some web pages in the TOR network, using Puppeteer and the tor package (apt install tor). Probably due to the nature of TOR connections sometimes I get a timeout. In addition, I'm new to asynchronous programming in JavaScript.

Usually I have a try-catch-construct like these:

await Promise.all([
  page.goto(url),
  page.waitForNavigation({
    waitUntil: 'domcontentloaded'
  }),
]).catch((err) => { logMyErrors(err, true); });

or

let langMenu = await page.waitForXPath('//*[contains(@class, ".customer_name")]/ancestor::li').catch((err) => { logMyErrors(err, true); });

But I think often one or more retries would help to finally get the desired resource. Is there any best practice to implement retries?


Solution

  • I would recommend this rather simple approach:

    async function retry(promiseFactory, retryCount) {
      try {
        return await promiseFactory();
      } catch (error) {
        if (retryCount <= 0) {
          throw error;
        }
        return await retry(promiseFactory, retryCount - 1);
      }
    }
    

    This function calls the promiseFactory, and waits for the returned Promise to finish. In case an error happens the process is (recursively) repeated until retryCount reaches 0.

    Code Sample

    You can use the function like this:

    await retry(
      () => page.waitForXPath('//*[contains(@class, ".customer_name")]/ancestor::li'),
      5 // retry this 5 times
    );
    

    You can also pass any other function returning a Promise like Promise.all:

    await retry(
      () => Promise.all([
        page.goto(url),
        page.waitForNavigation({ waitUntil: 'domcontentloaded' }),
      ]),
      1 // retry only once
    );
    

    Don't combine await and catch

    Another advice: You should not combine await with .then or .catch as this will result in unexpected problems. Either use await and surround your code with a try..catch block or use .then and .catch. Otherwise your code might be waiting for the results of a catch function to finish, etc.

    Instead, you use try..catch like this:

    try {
      // ...
    } catch (error) {
      logMyErrors(error);
    }