I have 400,000 cases with latitudes and longitudes. I want to convert these to zip codes. The code below works...
import geopy
from geopy.geocoders import Nominatim
geolocator = geopy.Nominatim(user_agent='my-application')
def get_zipcode(df, geolocator, lat_field, lon_field):
location = geolocator.reverse((df[lat_field], df[lon_field]))
if 'address' in location.raw.keys():
if 'postcode' in location.raw['address'].keys():
return location.raw['address']['postcode']
else:
None
But only on smaller batches, but it takes a while, like 15 minutes for 2,000 cases.
dfbatch1['pickup_zip'] = dfbatch1.apply(get_zipcode, axis=1, geolocator=geolocator, lat_field='pickup_latitude', lon_field='pickup_longitude')
What would be the best way to convert all of my latitudes & longitudes to zip codes?
Thanks!
Warning: not a GIS expert here!
It seems like this would be pretty easy using geopandas
and a source of zip code polygons (noting, of course, that zip codes are not, in fact, polygons):
For example, if I have a point data source with (lat, lon)
pairs in a file points.geojson
, I could do something like this:
import geopandas
points = geopandas.read_file('points.geojson')
zipcodes = geopandas.read_file("zip_poly.gdb")
zip_points = points.sjoin(zipcodes, how='left', )
The default behavior of sjoin
is to perform an intersects
query, which is what we want.
That gives me a geodataframe that maps coordinates (in the .geometry
attribute) to zip codes (in the .ZIP_CODE
attribute). I used this source for zip code data.
For example, given a point:
>>> points.query('NAME == "Boston"').geometry
1436 POINT (-71.05671 42.35959)
Name: geometry, dtype: geometry
I now know its zip code:
>>> zip_points.query('NAME=="Boston"').ZIP_CODE
1436 02109
Name: ZIP_CODE, dtype: object
I tested this using a data source with about 4000 points; I don't have handy anything approaching your 400000 point data source.