htmlnode.jsweb-scrapingcheerio

using cheerio to scrape the html can't retrieve the second consecutive element


guys, i am using cheerio to scrape a html document which is shown as below, I need to find out the href of two element in each article element.

<article>
  <div class="row">
       <div class="col-md-5 col-6">
          <a  class="btn" href="https://xxxxxx.png">abc1</a>
       </div>
       <div class="col-md-5 col-6">
          <a class="btn"  href="https://xxxxx">abc2</a>
       </div>
  </div>
</article>

<article>
   ....
</article>

....

below is my script which use .btn to find each element and use nth-child to get the them by order, it can successfully get the href of the first element, however it cannot get the value of second element. any idea how to solve the problem?

const $ = cheerio.load(html);
$("article").each((i, element) => {
    let element1 = $(element).find(".btn:nth-child(1)").attr("href");
    let element2 = $(element).find(".btn:nth-child(2)").attr("href");

    console.log(element1,element2);
 });

Solution

  • The nth-child(num) selector looks for the elements which are numth child of their immediate parent. That's why .btn:nth-child(2) returns no elements, since second a tag is also the first child of its immediate parent (the div with class col-md-5 and col-6).

    You could access both a tags in following manner:

    const $ = cheerio.load(html);
    $("article").each((i, element) => {
        let allBtns = $(element).find(".btn");
        let element1 = $(allBtns.get(0)).attr("href");
        let element2 = $(allBtns.get(1)).attr("href");
    
        console.log(element1,element2);
     });
    

    In this case, we get all the elements with btn class, and then look for 1st and 2nd element in that list (zero based index).