web-scrapinginspectjqxhr

Web Scraping Underlying Data from Online Interactive Map


I am trying to get the underlying data from the interactive map on this website:https://www.sabrahealth.com/properties

I tried using the Inspect feature on Google Chrome to find the XHR file that would hold the locations of all the points on the map but nothing appeared. Is there another way to extract the location data from this map?


Solution

  • Well, the location data is available to download on their site here. But let's assume you are wanting the actual latitude, longitude values to do some analysis.

    The first thing I would do is exactly what you did (look for the XHR). If I can't find anything there, the second thing I always do is search the html for the <script> tags. sometimes the data is "hiding" in there. It takes a little bit more detective work. It doesn't always yield results, but it does in this case.

    If you look within the <script> tags, you'll find the relevant json format. Then you can just work with that. It's just a matter of finding it then manipulating the string to get the valid json format, then use json.loads() to feed that in.

    import requests
    import bs4
    import json
    
    
    url = 'https://www.sabrahealth.com/properties'
    
    headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}
    
    
    response = requests.get(url, headers=headers)
    
    soup = bs4.BeautifulSoup(response.text, 'html.parser')
    
    scripts = soup.find_all('script')
    for script in scripts:
        if 'jQuery.extend(Drupal.settings,' in script.text:
            jsonStr = script.text.split('jQuery.extend(Drupal.settings,')[1]
            jsonStr = jsonStr.rsplit(');',1)[0]
    
            jsonObj = json.loads(jsonStr)
    
    
    for each in jsonObj['gmap']['auto1map']['markers']:
        name = each['markername']
        lat = each['latitude']
        lon = each['longitude']
    
        soup = bs4.BeautifulSoup(each['text'], 'html.parser')
    
        prop_type = soup.find('i', {'class':'property-type'}).text.strip()
        sub_cat = soup.find('span', {'class':'subcat'}).text.strip()
    
        location = soup.find('span', {'class':'subcat'}).find_next('p').text.split('\n')[0]
    
    
        print ('Type: %s\nSubCat: %s\nLat: %s\nLon: %s\nLocation: %s\n' %(prop_type, sub_cat, lat, lon, location))
    

    Output:

    Type: Senior Housing - Leased
    SubCat: Assisted Living
    Lat: 38.3309
    Lon: -85.862521
    Location: Floyds Knobs, Indiana
    
    Type: Skilled Nursing/Transitional Care
    SubCat: SNF
    Lat: 29.719507
    Lon: -99.06649
    Location: Bandera, Texas
    
    Type: Skilled Nursing/Transitional Care
    SubCat: SNF
    Lat: 37.189079
    Lon: -77.376015
    Location: Petersburg, Virginia
    
    Type: Skilled Nursing/Transitional Care
    SubCat: SNF
    Lat: 37.759998
    Lon: -122.254616
    Location: Alameda, California
    
    ...