guys, i am using cheerio to scrape a html document which is shown as below, I need to find out the href of two element in each article element.
<article>
<div class="row">
<div class="col-md-5 col-6">
<a class="btn" href="https://xxxxxx.png">abc1</a>
</div>
<div class="col-md-5 col-6">
<a class="btn" href="https://xxxxx">abc2</a>
</div>
</div>
</article>
<article>
....
</article>
....
below is my script which use .btn to find each element and use nth-child to get the them by order, it can successfully get the href of the first element, however it cannot get the value of second element. any idea how to solve the problem?
const $ = cheerio.load(html);
$("article").each((i, element) => {
let element1 = $(element).find(".btn:nth-child(1)").attr("href");
let element2 = $(element).find(".btn:nth-child(2)").attr("href");
console.log(element1,element2);
});
The nth-child(num)
selector looks for the elements which are num
th child of their immediate parent. That's why .btn:nth-child(2)
returns no elements, since second a
tag is also the first child of its immediate parent (the div
with class col-md-5
and col-6
).
You could access both a
tags in following manner:
const $ = cheerio.load(html);
$("article").each((i, element) => {
let allBtns = $(element).find(".btn");
let element1 = $(allBtns.get(0)).attr("href");
let element2 = $(allBtns.get(1)).attr("href");
console.log(element1,element2);
});
In this case, we get all the elements with btn
class, and then look for 1st and 2nd element in that list (zero based index).