[SOLVED] Web Scraping Underlying Data from Online Interactive Map

Web Scraping Underlying Data from Online Interactive Map

I am trying to get the underlying data from the interactive map on this website:https://www.sabrahealth.com/properties

I tried using the Inspect feature on Google Chrome to find the XHR file that would hold the locations of all the points on the map but nothing appeared. Is there another way to extract the location data from this map?

Solution

Well, the location data is available to download on their site here. But let's assume you are wanting the actual latitude, longitude values to do some analysis.

The first thing I would do is exactly what you did (look for the XHR). If I can't find anything there, the second thing I always do is search the html for the <script> tags. sometimes the data is "hiding" in there. It takes a little bit more detective work. It doesn't always yield results, but it does in this case.

If you look within the <script> tags, you'll find the relevant json format. Then you can just work with that. It's just a matter of finding it then manipulating the string to get the valid json format, then use json.loads() to feed that in.

import requests
import bs4
import json


url = 'https://www.sabrahealth.com/properties'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}


response = requests.get(url, headers=headers)

soup = bs4.BeautifulSoup(response.text, 'html.parser')

scripts = soup.find_all('script')
for script in scripts:
    if 'jQuery.extend(Drupal.settings,' in script.text:
        jsonStr = script.text.split('jQuery.extend(Drupal.settings,')[1]
        jsonStr = jsonStr.rsplit(');',1)[0]

        jsonObj = json.loads(jsonStr)


for each in jsonObj['gmap']['auto1map']['markers']:
    name = each['markername']
    lat = each['latitude']
    lon = each['longitude']

    soup = bs4.BeautifulSoup(each['text'], 'html.parser')

    prop_type = soup.find('i', {'class':'property-type'}).text.strip()
    sub_cat = soup.find('span', {'class':'subcat'}).text.strip()

    location = soup.find('span', {'class':'subcat'}).find_next('p').text.split('\n')[0]


    print ('Type: %s\nSubCat: %s\nLat: %s\nLon: %s\nLocation: %s\n' %(prop_type, sub_cat, lat, lon, location))

Output:

Type: Senior Housing - Leased
SubCat: Assisted Living
Lat: 38.3309
Lon: -85.862521
Location: Floyds Knobs, Indiana

Type: Skilled Nursing/Transitional Care
SubCat: SNF
Lat: 29.719507
Lon: -99.06649
Location: Bandera, Texas

Type: Skilled Nursing/Transitional Care
SubCat: SNF
Lat: 37.189079
Lon: -77.376015
Location: Petersburg, Virginia

Type: Skilled Nursing/Transitional Care
SubCat: SNF
Lat: 37.759998
Lon: -122.254616
Location: Alameda, California

...