javascripttimeoutpuppeteer

Skip to next step on timeout


I'm leveraging Puppeteer to open a website from a list of URLs, grab a few pieces of data, then write to CSV.

While there are a few elements that could be collected from a given URL, not all URLs will have all elements.

When my code is unable to find one of the stated elements (xpath) it times out and stops the code altogether. Instead of doing this, I would like it to either enter null or 0 to indicate that no data was actually gathered from the URL for that element.

I tried adjusted the duration until timeout but it doesn't move to the next step, it just exists the script altogether (as it does with the default timeout).

As there will be instances where the xpath can't be found, I don't want to disable timeout as it will just loop forever at that point.

Here's my code as it currently stands:

const puppeteer = require('puppeteer');
const fs = require('fs');
const csv = require('csv-parser');
const createCsvWriter = require('csv-writer').createObjectCsvWriter;

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  
  const urls = [];
    fs.createReadStream('urls.csv')
        .pipe(csv())
        .on('data', (row) => {
            urls.push(row.url); // Assuming the CSV has a column named 'url'
        })
        .on('end', async () => {
                       
            for (const url of urls) {
                await page.goto(url, { waitUntil: 'networkidle2' });
                const url_visited = url

                //* PRICE 1

                    let xpath_ELEMENT_1 = 'XPATH';
                    const el1 = await page.waitForSelector('xpath/' + xpath_ELEMENT_1);
                    const ELEMENT_1 = await page.evaluate(el => el.textContent.trim(), el1);

                //* PRICE 2

                    let xpath_ELEMENT_2 = 'XPATH';
                    const el1 = await page.waitForSelector('xpath/' + xpath_ELEMENT_2);
                    const ELEMENT_2 = await page.evaluate(el => el.textContent.trim(), el2);


// create csv file
const csvWriter = createCsvWriter({
    path: 'output.csv',
    header: [
        {id: 'url', title: 'URL'},
        {id: 'price1', title: 'Price1'},
        {id: 'price2', title: 'Price2'}
    ]
});

// create record using collected data
const records = [
    {url: url_visited, price1: ELEMENT_1, price: ELEMENT_2}
]

// write record to csv
await csvWriter.writeRecords(records);
}

await browser.close();
});
})();```

Solution

  • You need to wrap your code in a try...catch blocks so you can catch errors, avoid timeouts and also write null's to your results.

    Something like this:

    try {
      await page.goto(url, { waitUntil: "networkidle2" });
    
      let ELEMENT_1 = null;
      let ELEMENT_2 = null;
    
      try {
        const el1 = await page.waitForSelector("xpath/XPATH_1", { timeout: 3000 });
        ELEMENT_1 = await page.evaluate((el) => el.textContent.trim(), el1);
      } catch (error) {
        // set null for the ELEMENT_1
      }
    
      try {
        const el2 = await page.waitForSelector("xpath/XPATH_2", { timeout: 3000 });
        ELEMENT_2 = await page.evaluate((el) => el.textContent.trim(), el2);
      } catch (error) {
        // set null for the ELEMENT_2
      }
    } catch (error) {
      // set null for both elements
    }