I have a dataframe of 1100 rows with moving data: things like origin cities and countries as well as destination cities and countries.
The process I'm working through involves taking city names (eg: Portland, Oregon) and sending them to the Nominatim search page (https://nominatim.openstreetmap.org/search/) to pull out the latitude and longitude.
I found a pretty good one-off example on Stackoverflow:
import requests
import urllib.parse
address = 'Portland, Oregon'
url = 'https://nominatim.openstreetmap.org/search/' + urllib.parse.quote(address) +'?format=json'
response = requests.get(url).json()
print(response[0]["lat"])
print(response[0]["lon"])
This works great even when I have non-city entries (eg: Texas, United States or Bavaria, Germany).
The issue I'm running into now is that I can't quite get the code to run down my list of locations in my dataframe column and pull out the info I need.
Here is my code:
segment1 = 'https://nominatim.openstreetmap.org/search/'
segment3 = '?format=json'
df1['json_location_data'] = df1.apply(lambda x: requests.get(segment1 + urllib.parse.quote(str(df1['Origin'])) + segment3).json())
I'm getting an error that reads:
ValueError: Expected a 1D array, got an array with shape (1100, 17)
Not sure how to fix this error, so I created a reproducible example here:
import pandas as pd
locations = ['Portland, Oregon', 'Seattle, Washington','New York, New York','Texas, United States']
df = pd.DataFrame(locations, columns=['locations'])
segment1 = 'https://nominatim.openstreetmap.org/search/'
segment3 = '?format=json'
df['json_location_data'] = df.apply(lambda x: requests.get(segment1 + urllib.parse.quote(str(df['locations'])) + segment3).json())
This works without producing any errors, but returns a column with all NAs.
How can I solve this issue and get the desired data?
Here's a version that works. Note that I'm extracting only the lat and long from the rather large structure that gets returned.
import urllib
import pandas as pd
import requests
locations = ['Portland, Oregon', 'Seattle, Washington','New York, New York','Texas, United States']
df = pd.DataFrame(locations, columns=['locations'])
segment1 = 'https://nominatim.openstreetmap.org/search/'
segment3 = '?format=json'
def getdata(loc):
print(loc)
data = requests.get(segment1 + urllib.parse.quote(loc) + segment3).json()
return {'lat':data[0]['lat'],'lon':data[0]['lon']}
df['json_location_data'] = df['locations'].apply(getdata)
print(df)
Output:
Portland, Oregon
Seattle, Washington
New York, New York
Texas, United States
locations json_location_data
0 Portland, Oregon {'lat': '45.5202471', 'lon': '-122.674194'}
1 Seattle, Washington {'lat': '47.6038321', 'lon': '-122.330062'}
2 New York, New York {'lat': '40.7127281', 'lon': '-74.0060152'}
3 Texas, United States {'lat': '31.2638905', 'lon': '-98.5456116'}