pythonpandasgeolocationnominatim

Bulk process a number of latitude and longitude values from a dataframe and make a new column from those responses


In my Dataframe I have two columns of latitude and longitude. I want to use these two columns to calculate my test_url column for getting the country inside it.

I'm using the Nominatim OpenStreetMap api url for this.

My imports:

import pandas as pd
import requests

My check_country function:

def check_country(url):
    
    r = requests.get(url)
    results = r.json()['address']
    
    return results['country']

Column calculation:

df['test_url'] = df[['latitude','longitude']].apply(lambda x : check_country(f"https://nominatim.openstreetmap.org/reverse?lat={x[0]}&lon={x[1]}&format=json"),axis=1)

But with this I'm getting a connection error.

Error

ConnectionError: 

HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /reverse?lat=10.75161&lon=77.11299&format=json 

(Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002262FC94C40>: 

Failed to establish a new connection:

[WinError 10061] No connection could be made because the target machine actively refused it'

))

Solution

  • You can use GeoPandas and use the "World Administrative Boundaries" dataset to make local requests. First step is to ownload the GeoJSON file and install geopandas then:

    # Python env: pip install geopandas
    # Anaconda env: conda install geopandas
    
    import geopandas as gpd
    from shapely.geometry import Point
    
    gdf = gpd.read_file('world-administrative-boundaries.geojson')
    p = Point(77.11299, 10.75161)
    
    out = gdf.loc[gdf.intersects(p), 'name']
    print(out)
    
    # Output:
    226    India
    Name: name, dtype: object
    

    Advanced usage: Multiple coordinates:

    coords = [(40.730610, -73.935242), (10.75161, 77.11299)]
    points = [Point(lon, lat) for lat, lon in  coords]
    dfp = gpd.GeoDataFrame({'geometry': points}, crs=gdf.crs)
    out = gpd.sjoin(dfp, gdf, predicate='within')
    print(out)
    
    # Output
                         geometry  index_right           french_short iso3        status iso_3166_1_alpha_2_codes                      name            region color_code continent
    0  POINT (-73.93524 40.73061)          182  États-Unis d'Amérique  USA  Member State                       US  United States of America  Northern America        USA  Americas
    1   POINT (77.11299 10.75161)          226                   Inde  IND  Member State                       IN                     India     Southern Asia        IND      Asia