google-apps-scriptweb-scrapingurlfetch

Web scraping with Google Apps Script


I'm trying to pull data from the following sample web page using Google Apps Script:

url = http://www.premierleague.com/players/2064/Wayne-Rooney/stats?se=54

using, UrlFetchApp.Fetch(url)

The problem is when I use UrlFetchApp.Fetch(url) to do that, I don't get the page information defined by the 'se' parameter in the URL. Instead, I get the information on the following URL because it looks like the 'se=54' page is asynchronously loaded:

http://www.premierleague.com/players/2064/Wayne-Rooney/stats

Is there any way to pass the parameter 'se' some other way? I was looking at the function and it allows the specification of 'options', as they are referred to, but the documentation on the topic is very limited.


Solution

  • Go to that website in your browser and open the developer tools (F12 or ctr-shift-i). Click on the network tab and reload the page with F5. A list of requests will appear. At the bottom of the list you should see the asynchronous requests made to fetch the information. Those requests get the data in json form from footballapi.pulselive.com. You can do the same thing in apps script. But you have to send a correct "origin" header line or your request gets rejected. Here is an example.

    function fetchData() {
      var url = "http://footballapi.pulselive.com/football/stats/player/2064?comps=1";
      var options = {
        "headers": {
          "Origin": "http://www.premierleague.com"
        }
      }
      var json = JSON.parse(UrlFetchApp.fetch(url, options).getContentText()); 
      for(var i = 0; i < json.stats.length; i++) {
        if(json.stats[i].name === "goals") Logger.log(json.stats[i]);
      }
    }