web-scrapinggraphqlairbnb-js-styleguide

How to set the AirBNB API "extensions" parameter?


I am trying to scrape AirBNB by plain HTTP requests and noticed something.

Let's say we use this search string: "New York, New York, United States".

The simplest working request (striped off from unnecessary headers and fields) I can use to get the desired results is this:

GET /api/v3/ExploreSections?operationName=ExploreSections&locale=en&currency=USD&variables=%7B%22isInitialLoad%22%3Atrue%2C%22hasLoggedIn%22%3Afalse%2C%22cdnCacheSafe%22%3Afalse%2C%22source%22%3A%22EXPLORE%22%2C%22exploreRequest%22%3A%7B%22metadataOnly%22%3Afalse%2C%22version%22%3A%221.8.3%22%2C%22itemsPerGrid%22%3A20%2C%22placeId%22%3A%22ChIJOwg_06VPwokRYv534QaPC8g%22%2C%22query%22%3A%22New%20York%2C%20New%20York%2C%20United%20States%22%2C%22cdnCacheSafe%22%3Afalse%2C%22screenSize%22%3A%22large%22%2C%22isInitialLoad%22%3Atrue%2C%22hasLoggedIn%22%3Afalse%7D%2C%22removeDuplicatedParams%22%3Atrue%7D&extensions=%7B%22persistedQuery%22%3A%7B%22version%22%3A1%2C%22sha256Hash%22%3A%2282cc0732fe2a6993a26859942d1342b6e42830704b1005aeb2d25f78732275e7%22%7D%7D HTTP/2
Host: www.airbnb.com
X-Airbnb-Api-Key: d306zoyjsyarp7ifhu67rjxn52tv0t20
Accept-Encoding: gzip, deflate

At this point, that API key is pretty much public, so not a concern.

The readable content of the "variables" parameter is this:

{
  "isInitialLoad": true,
  "hasLoggedIn": false,
  "cdnCacheSafe": false,
  "source": "EXPLORE",
  "exploreRequest": {
    "metadataOnly": false,
    "version": "1.8.3",
    "itemsPerGrid": 20,
    "placeId": "ChIJOwg_06VPwokRYv534QaPC8g",
    "query": "New York, New York, United States",
    "cdnCacheSafe": false,
    "screenSize": "large",
    "isInitialLoad": true,
    "hasLoggedIn": false
  },
  "removeDuplicatedParams": true
}

The readable content of the "extensions" parameter is this:

{
  "persistedQuery": {
    "version": 1,
    "sha256Hash": "82cc0732fe2a6993a26859942d1342b6e42830704b1005aeb2d25f78732275e7"
  }
}

I am trying to figure out where that hash comes from.

It seems it's calculated from a GraphQL query but I don't know anything else and there is no documentation about it.

Any help?


Solution

  • I had the same issue (wanted to get the prices) and after investigating in the HAR files that you can get with Chrome, I found out that you get this value from a Javascript file called PdpPlatformRoute.xxx.js

    The steps to get this hash are simply to load the file PdpPlatformRoute.xxx.js, then to parse the file to get an "operationId".

    If this helps, this is how I did this.

    // contentPage is the HTML content of the listing page (e.g. https://www.airbnb.com/rooms/1234567)
    function getPdpPlatformRouteUrl(contentPage) {
      return 'https://a0.muscache.com/airbnb/static/packages/web/en/frontend/gp-stays-pdp-route/routes/' + `${contentPage}`.match(/(PdpPlatformRoute\.\w+\.\js)/)?.[1];
    }
    
    // textContent is the JS content that you get when you fetch the previously found URL
    function getSha256(textContent) {
      return `${textContent}`.match(/name:'StaysPdpSections',type:'query',operationId:'(.*)'/)?.[1];
    }