So, I'm working on a mapping application for geotagged images and I want include address information for the points of interest on my map. I have managed to successfully complete most of the task using Geopandas, GeoPy, and Nominatim with point data from a PostGIS table (e.g. POINT Z (8.726176366993529 50.10868874301912 96.90000000000001).
While the script does most of what I want, the result returns a lot of extraneous information and I'd like to parse it to just one or two pieces of data before updating my database. I was able to hack together my script using two articles on gecoding and reverse geocoding. My issue comes down to not being sure how the script receives the response object and how I can access the properties either before or after they're added to my Dataframe.
My code without import statements is as follows:
conn = psycopg2.connect(
host="localhost",
database="Nizz0k",
user="Nizz0k",
password="")
sql = "select * from public.\"Peng\""
engine = create_engine('postgresql://Nizz0k@localhost:5432/public.\"Peng\"')
df = gpd.read_postgis(sql, conn, geom_col="geom")
df['lon'] = df.geometry.apply(lambda p: p.x)
df['lat'] = df.geometry.apply(lambda p: p.y)
df['geocode'] = df['lat'].map(str) + ', ' + df['lon'].map(str)
locator = Nominatim(user_agent="pengMappingAgent", timeout=10)
rgeocode = RateLimiter(locator.reverse, min_delay_seconds=0.001)
tqdm.pandas()
df['address'] = df['geocode'].progress_apply(rgeocode)
So, my Python knowledge is very limited, but nothing I've tried to access the properties in the newly created df['address']
column seems to be working. Calling df.head()
shows the correctly created column and address information, but now I want to simplify the info in the column and extract parts of it to new columns. Ideally, I'd like to get the street and house number information and neighborhood information pulled out, and get rid of the city, county, state, and country information as it's redundant.
Based on the research I've done, I should be able to pull this information out of the response object, but I'm not sure where or how to access it. It seems that this info gets converted to a string in my column (I think), and if not, I'm not sure how to set up a loop or lambda function to get this stuff out. Worst case, I assume just some string manipulation might achieve my goal, but it seems like there should be an easier way.
import geopandas as gpd
import shapely.geometry
from geopy.geocoders import Nominatim
import pandas as pd
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# a geodata frame with a few points...
df = gpd.GeoDataFrame(
geometry=gdf.loc[gdf["iso_a3"].eq("BEL"), "geometry"]
.apply(lambda p: p.exterior.coords)
.explode()
.apply(shapely.geometry.Point),
crs="EPSG:4326",
).reset_index(drop=True)
locator = Nominatim(user_agent="pengMappingAgent", timeout=10)
df = df.join(df["geometry"].apply(lambda p: locator.reverse(f"{p.y}, {p.x}").raw["address"]).apply(pd.Series))
print(df.head(3).to_markdown(index=False))
df
geometry | road | suburb | city | county | state | postcode | country | country_code | house_number | village | hamlet | town | region | locality | municipality | isolated_dwelling | neighbourhood | tourism |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
POINT (6.15665815595878 50.80372101501058) | A 4 | Verlautenheide | Aachen | Städteregion Aachen | Nordrhein-Westfalen | 52080 | Deutschland | de | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
POINT (6.043073357781111 50.12805166279423) | Beieknapp | nan | nan | Canton Clervaux | nan | 9962 | Lëtzebuerg | lu | 14 | Holler | nan | nan | nan | nan | nan | nan | nan | nan |
POINT (5.782417433300907 50.09032786722122) | nan | nan | nan | Bastogne | Luxembourg | 6600 | België / Belgique / Belgien | be | nan | Noville | Neufmoulin | Bastogne | Wallonie | nan | nan | nan | nan | nan |