node.jsapiaxioscheerio

How can I get the data of the parts of the site that need to be loaded with the cheerio?


I want to get the rows of this site with cheerio, but because the site needs to be loaded, it only shows me the first 10 rows. How can I get all rows of this table? coinmarketcap.com enter image description here

On this site, there are 100 rows for the table on the first page. I need to get the information of all these 100 rows, but this code I wrote only gives the first 10. Because when I load the site, it shows only the first 10 at the first moment, and the rest are loaded and then shown.

const express = require("express");
const axios = require("axios");
const cheerio = require("cheerio");

let PORT = 8000;
let links = "https://coinmarketcap.com";

const app = express();

axios.get(link).then((response) => {
  const html = response.data;
  const $ = cheerio.load(html);

  $(".coin-logo").each(function (i) {
    console.log($(this).attr("src"), i);
  });
});

app.listen(PORT, () => console.log(`server is running on PORT: ${PORT}`));

here in console

server is running on PORT: 8000
https://s2.coinmarketcap.com/static/img/coins/64x64/1.png 0
https://s2.coinmarketcap.com/static/img/coins/64x64/1027.png 1
https://s2.coinmarketcap.com/static/img/coins/64x64/825.png 2
https://s2.coinmarketcap.com/static/img/coins/64x64/1839.png 3
https://s2.coinmarketcap.com/static/img/coins/64x64/3408.png 4
https://s2.coinmarketcap.com/static/img/coins/64x64/52.png 5
https://s2.coinmarketcap.com/static/img/coins/64x64/2010.png 6
https://s2.coinmarketcap.com/static/img/coins/64x64/3890.png 7
https://s2.coinmarketcap.com/static/img/coins/64x64/74.png 8
https://s2.coinmarketcap.com/static/img/coins/64x64/5426.png 9

Returns only the first ten rows. While the table has 100 rows. enter image description here


Solution

  • This is a React/Next.js app, which means the data isn't in the static HTML that axios requests, it's added to the DOM by JS after the page loads. The data for single page apps (SPAs) typically comes in through an API endpoint, which you can often hit directly if unsecured.

    In this case, the data is (fortunately) in a <script id="__NEXT_DATA__">, which is used after the page loads by JS to create the visible elements you see in the dev tools. You can get the data as follows:

    const axios = require("axios");
    const cheerio = require("cheerio");
    require("util").inspect.defaultOptions.depth = null;
    
    const url = "<Your URL>";
    
    axios.get(url).then(response => {
      const html = response.data;
      const $ = cheerio.load(html);
      const payload = $("#__NEXT_DATA__").first().text();
      const {data} = JSON.parse(JSON.parse(payload).props.initialState)
        .cryptocurrency.listingLatest;
      console.log(data);
    });
    

    The structure is compressed and doesn't have headers. If you want to map the headers to the data so it's a bit more readable, you can:

    const payload = $("#__NEXT_DATA__").first().text();
    const {data} = JSON.parse(
      JSON.parse(payload).props.initialState
    ).cryptocurrency.listingLatest;
    const [{keysArr}, ...rest] = data;
    const withKeys = rest.map(e =>
      Object.fromEntries(
        e.map((e, i) => [keysArr[i] ?? "unknown", e])
      )
    );
    console.log(withKeys.slice(0, 10));
    

    Now, here's code to show the data similar to the first few columns on the site:

    const summary = withKeys.map(e => ({
      "id": e.id,
      "name": e.name,
      "symbol": e.symbol,
      "price": e["quote.USD.price"],
      "1h": e["quote.USD.percentChange1h"],
      "24h": e["quote.USD.percentChange24h"],
      "marketCap": e["quote.USD.marketCap"],
    }));
    console.log(summary);
    console.log(summary.length); // => 100
    

    Output:

    [
      {
        id: 1,
        name: 'Bitcoin',
        symbol: 'BTC',
        price: 28422.366435538406,
        '1h': -0.02014955,
        '24h': 5.28633725,
        marketCap: 549443765021.2035
      },
      {
        id: 1027,
        name: 'Ethereum',
        symbol: 'ETH',
        price: 1809.5420505966479,
        '1h': 0.00783376,
        '24h': 3.55526047,
        marketCap: 221440656815.19766
      },
      {
        id: 825,
        name: 'Tether',
        symbol: 'USDT',
        price: 0.9998792896202411,
        '1h': -0.00074249,
        '24h': -0.02658359,
        marketCap: 79511295635.83586
      },
      // ...
    ]