pythongeopandasgeopynominatim

How parse address components returned by Nominatim reverse geocoding?


So, I'm working on a mapping application for geotagged images and I want include address information for the points of interest on my map. I have managed to successfully complete most of the task using Geopandas, GeoPy, and Nominatim with point data from a PostGIS table (e.g. POINT Z (8.726176366993529 50.10868874301912 96.90000000000001).

While the script does most of what I want, the result returns a lot of extraneous information and I'd like to parse it to just one or two pieces of data before updating my database. I was able to hack together my script using two articles on gecoding and reverse geocoding. My issue comes down to not being sure how the script receives the response object and how I can access the properties either before or after they're added to my Dataframe.

My code without import statements is as follows:

conn = psycopg2.connect(
    host="localhost",
    database="Nizz0k",
    user="Nizz0k",
    password="")
sql = "select * from public.\"Peng\""
engine = create_engine('postgresql://Nizz0k@localhost:5432/public.\"Peng\"')
df = gpd.read_postgis(sql, conn, geom_col="geom")
df['lon'] = df.geometry.apply(lambda p: p.x)
df['lat'] = df.geometry.apply(lambda p: p.y)
df['geocode'] = df['lat'].map(str) + ', ' + df['lon'].map(str)
locator = Nominatim(user_agent="pengMappingAgent", timeout=10)
rgeocode = RateLimiter(locator.reverse, min_delay_seconds=0.001)
tqdm.pandas()
df['address'] = df['geocode'].progress_apply(rgeocode)

So, my Python knowledge is very limited, but nothing I've tried to access the properties in the newly created df['address'] column seems to be working. Calling df.head() shows the correctly created column and address information, but now I want to simplify the info in the column and extract parts of it to new columns. Ideally, I'd like to get the street and house number information and neighborhood information pulled out, and get rid of the city, county, state, and country information as it's redundant.

Based on the research I've done, I should be able to pull this information out of the response object, but I'm not sure where or how to access it. It seems that this info gets converted to a string in my column (I think), and if not, I'm not sure how to set up a loop or lambda function to get this stuff out. Worst case, I assume just some string manipulation might achieve my goal, but it seems like there should be an easier way.


Solution

  • import geopandas as gpd
    import shapely.geometry
    from geopy.geocoders import Nominatim
    import pandas as pd
    
    gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
    
    # a geodata frame with a few points...
    df = gpd.GeoDataFrame(
        geometry=gdf.loc[gdf["iso_a3"].eq("BEL"), "geometry"]
        .apply(lambda p: p.exterior.coords)
        .explode()
        .apply(shapely.geometry.Point),
        crs="EPSG:4326",
    ).reset_index(drop=True)
    
    locator = Nominatim(user_agent="pengMappingAgent", timeout=10)
    
    df = df.join(df["geometry"].apply(lambda p: locator.reverse(f"{p.y}, {p.x}").raw["address"]).apply(pd.Series))
    
    print(df.head(3).to_markdown(index=False))
    df
    

    output

    geometry road suburb city county state postcode country country_code house_number village hamlet town region locality municipality isolated_dwelling neighbourhood tourism
    POINT (6.15665815595878 50.80372101501058) A 4 Verlautenheide Aachen Städteregion Aachen Nordrhein-Westfalen 52080 Deutschland de nan nan nan nan nan nan nan nan nan nan
    POINT (6.043073357781111 50.12805166279423) Beieknapp nan nan Canton Clervaux nan 9962 Lëtzebuerg lu 14 Holler nan nan nan nan nan nan nan nan
    POINT (5.782417433300907 50.09032786722122) nan nan nan Bastogne Luxembourg 6600 België / Belgique / Belgien be nan Noville Neufmoulin Bastogne Wallonie nan nan nan nan nan