node.jsweb-crawlerdomcrawler

Node.js Promises returning undefined for some elements, what am I doing wrong or what can I improve?


Some of my promises are returning "undefined" and I can't see what I'm doing wrong. Tried to add a timeout but didn't solve it, I'm still learing and struggling to see my error.

it looks like the issue on scrapeSpecificOperatorStats block, however some returns are fine.

Example: expect:

  {
    name: 'Kapkan',
    imageUrl: 'https://staticctf.ubisoft.com/J3yJr34U2pZ2Ieem48Dwy9uqj5PNUQTn/7MofnDHeL1uwsenBVjxplQ/1e5af8fe9cf6f36516c7f6e5d56fcac0/r6-operators-list-kapkan.png',
    linkUrl: 'https://www.ubisoft.com/en-us/game/rainbow-six/siege/game-info/operators/kapkan',
    operatorStats: { specialty: 'ANTI-ENTRY, TRAPPER' }
  }

actual:

  {
    name: 'Kapkan',
    imageUrl: 'https://staticctf.ubisoft.com/J3yJr34U2pZ2Ieem48Dwy9uqj5PNUQTn/7MofnDHeL1uwsenBVjxplQ/1e5af8fe9cf6f36516c7f6e5d56fcac0/r6-operators-list-kapkan.png',
    linkUrl: 'https://www.ubisoft.com/en-us/game/rainbow-six/siege/game-info/operators/kapkan',
    operatorStats: undefined
  }

the issue is only on operatorStats and only for some of the elements

My code:

const axios = require("axios");
const cheerio = require("cheerio");

// Define the URL of the webpage
const url = "https://www.ubisoft.com/en-us/game/rainbow-six/siege/game-info/operators";

// Define a function to scrape the operator stats from the link URL
const scrapeSpecificOperatorStats = async (link) => {
  try {
    // Make a HTTP request to the webpage
    const response = await axios.get(link);

    // Load the HTML content into Cheerio
    const $ = cheerio.load(response.data);

    // Find and Extract the operator specialty
    const specialty = $("span.operator__header__roles").text();

    // Return the results
    return { specialty };
  } catch (error) {
    console.error(error);
  }
};

// Define a function to scrape the operator stats
const scrapeOperatorStats = async (url) => {
  try {
    // Make a HTTP request to the webpage
    const response = await axios.get(url);

    // Load the HTML content into Cheerio
    const $ = cheerio.load(response.data);

    // Find the element that contains the operator stats
    const statsElement = $(".oplist__card");

    // Create an array of promises for each operator stat element
    const statsPromises = statsElement.map(async (index, element) => {
      // Extract the operator name, specialty, and description
      const name = $(element).find("span").text();

      // Extract the image URL and link URL
      const imageUrl = $(element).find("img").attr("src");
      const linkUrl = "https://www.ubisoft.com" + $(element).attr("href");

      // Get the operator stats
      const operatorStats = await scrapeSpecificOperatorStats(linkUrl);

      // Return the result
      return {
        name,
        imageUrl,
        linkUrl,
        operatorStats,
      };
    });

    // Wait for all promises to resolve and return the results
    return Promise.all(statsPromises);
  } catch (error) {
    console.error(error);
  }
};

// Call the function and log the results
scrapeOperatorStats(url)
  .then((results) => console.log(results))
  .catch((error) => console.error(error));

Solution

  • The problem is that scrapeSpecificOperatorStats returns undefined when an error occurs, because you have a try/catch and the catch just logs out the error. That means execution falls off the end of the function, which does an implicit return of undefined.

    To fix it, either allow errors in scrapeSpecificOperatorStats to propagate rather than converting them to undefined (by removing the try/catch) and handle them at a higher level (such as in scrapeOperatorStats), or handle the fact scrapeSpecificOperatorStats returns undefined when an error occurs. Your best bet is usually the former; handling errors too early is a common anti-pattern.