pythonpandasdata-analysisgeopynominatim

Python get state district and state column for each lat long coordinate in Geopy


I have a list of 200+ latitude and longitude coordinate pairs.

For each coordinate pair I want to create a dataframe which contains column district and column state. So my dataframe will have 3 columns cord, district and state.

For this I am using geopy library but I am unable to get record for more than 115 coordinates.

Sample Data

    cord
0   (19.4, 17.93)
1   (55.54, 93.93)
2   (52.45, 78.93)
3   (65.54, 67.93)
4   (47.74, 99.93)


Required Output Demo

    cord        district    state
0   (19.4, 17.93)   xyz      aaa
1   (55.54, 93.93)  adc      aaa
2   (52.45, 78.93)  gyu      drt
3   (65.54, 67.93)  www      bhn
4   (47.74, 99.93)  ccf      bvg


I have tried this code but unable to get fetch details for more than 115 queries.

from geopy.geocoders import Nominatim
district = {} # Initialize empty dict
geo_loc # List containing all the codrinates in this format (lat, long)
for cord in geo_loc:
    geolocator = Nominatim(user_agent='user_agent')
    location = geolocator.reverse(cord, addressdetails=True)
    district[cord] = location.raw['address']['state_district']


I need to fetch maximum of 500 unique coordinates at one time.
Also I need district and state name both in separate columns.


Solution

  • From Nominatim Usage Policy they require not to do heavy usage i.e. maximum 1 request per second. "No heavy uses (an absolute maximum of 1 request per second)." You can use geopy's RateLimiter to send 1 request per second. I've tested the following code works for more than 115 requests:

    from geopy.extra.rate_limiter import RateLimiter
    from geopy.geocoders import Nominatim
    import pandas as pd
    geolocator = Nominatim(user_agent="user_agent")
    # add rate limit
    reverse = RateLimiter(geolocator.reverse, min_delay_seconds=1)
    state_list = [] # Initialize empty dict
    # create dataframe
    df = pd.DataFrame({"geo_loc"  :[(19.4, 17.93), (55.54, 93.93),(52.45, 78.93),  (65.54, 67.93),  (47.74, 99.93) ]})
    # get location coordinates
    geo_loc  = df.geo_loc.values
    for cord in geo_loc:
        # send request
        location = reverse(cord, addressdetails=True)
        # get state value
        state = location.raw["address"].get("state")
        # store state value
        state_list.append(state)
    # assign back states
    df['states'] = state_list
    print(df)
    

    Resulting dataframe:

            geo_loc                           states
    0   (19.4, 17.93)                   Tibesti تيبستي
    1  (55.54, 93.93)                Красноярский край
    2  (52.45, 78.93)                   Алтайский край
    3  (65.54, 67.93)  Ямало-Ненецкий автономный округ
    4  (47.74, 99.93)                         Архангай