javascripthtmldom

Extract HTML Page variable from script-tag in Javascript


How can I extract a variable from a script tag of the page from a returned HTML Page in Javasc./Typescript?

My API request to the Server: const response = await fetch( ... )

The response contains a big HTML Page, here just an example:

<h1>Willkommen auf der Seite für Steam App Daten</h1>

<script type="text/javascript">
  var g_rgAppContextData = {
    "730": {
      "appid": 730,
      "name": "Counter-Strike 2",
      "icon": "https://cdn.fastly.steamstatic.com/steamcommunity/public/images/apps/730/8dbc71957312bbd3baea65848b545be9eae2a355.jpg",
      "link": "https://steamcommunity.com/app/730"
    }
  };
  var g_rgCurrency = [];
</script>

I only want to extract the Variable g_rgAppContextData without anything else. I know, that i can select the script tag with getElementsByTagName("script") but what if there are 2 script tags? And how to select only the Variable?


Solution

  • Since the pages you want to scrape follow a certain pattern, it seems possible to make a number of simplifying assumptions about the structure of the returned HTML:

    Let me know if these assumptions are not justified in your case.

    Under these assumptions, you can extract the variable value with a regular expression and parse it as JSON:

    const response = await fetch("...");
    const html = await response.text();
    const g_rgAppContextData = JSON.parse(
      html.match(/g_rgAppContextData\s*=\s*(\{.*?\});/s)[1]
    );