pythonpandasnominatim

Use lambda function to pull latitude & longitude out of city names


I have a dataframe of 1100 rows with moving data: things like origin cities and countries as well as destination cities and countries.

The process I'm working through involves taking city names (eg: Portland, Oregon) and sending them to the Nominatim search page (https://nominatim.openstreetmap.org/search/) to pull out the latitude and longitude.

I found a pretty good one-off example on Stackoverflow:

import requests
import urllib.parse

address = 'Portland, Oregon'
url = 'https://nominatim.openstreetmap.org/search/' + urllib.parse.quote(address) +'?format=json'

response = requests.get(url).json()
print(response[0]["lat"])
print(response[0]["lon"])

This works great even when I have non-city entries (eg: Texas, United States or Bavaria, Germany).

The issue I'm running into now is that I can't quite get the code to run down my list of locations in my dataframe column and pull out the info I need.

Here is my code:

segment1 = 'https://nominatim.openstreetmap.org/search/'
segment3 = '?format=json'
df1['json_location_data'] = df1.apply(lambda x: requests.get(segment1 + urllib.parse.quote(str(df1['Origin'])) + segment3).json())

I'm getting an error that reads:

ValueError: Expected a 1D array, got an array with shape (1100, 17)

Not sure how to fix this error, so I created a reproducible example here:

import pandas as pd
locations = ['Portland, Oregon', 'Seattle, Washington','New York, New York','Texas, United States']
df = pd.DataFrame(locations, columns=['locations'])

segment1 = 'https://nominatim.openstreetmap.org/search/'
segment3 = '?format=json'
df['json_location_data'] = df.apply(lambda x: requests.get(segment1 + urllib.parse.quote(str(df['locations'])) + segment3).json())

This works without producing any errors, but returns a column with all NAs.

How can I solve this issue and get the desired data?


Solution

  • Here's a version that works. Note that I'm extracting only the lat and long from the rather large structure that gets returned.

    import urllib
    import pandas as pd
    import requests
    
    locations = ['Portland, Oregon', 'Seattle, Washington','New York, New York','Texas, United States']
    df = pd.DataFrame(locations, columns=['locations'])
    
    segment1 = 'https://nominatim.openstreetmap.org/search/'
    segment3 = '?format=json'
    def getdata(loc):
        print(loc)
        data = requests.get(segment1 + urllib.parse.quote(loc) + segment3).json()
        return {'lat':data[0]['lat'],'lon':data[0]['lon']}
    
    df['json_location_data'] = df['locations'].apply(getdata)
    print(df)
    

    Output:

    Portland, Oregon
    Seattle, Washington
    New York, New York
    Texas, United States
                  locations                           json_location_data
    0      Portland, Oregon  {'lat': '45.5202471', 'lon': '-122.674194'}
    1   Seattle, Washington  {'lat': '47.6038321', 'lon': '-122.330062'}
    2    New York, New York  {'lat': '40.7127281', 'lon': '-74.0060152'}
    3  Texas, United States  {'lat': '31.2638905', 'lon': '-98.5456116'}