pythonpandasgeopandas

Geopandas checking whether point is inside polygon


I have ocean geopandas which contains 1 multipolygon (source: naturalearthdata.com)

I also have another dataframe that contains at lot of longitude and latitude information

I want to add a new column that will be True if the Point is in the ocean (inside the multipolygon)

zipfile = "ne_10m_ocean/ne_10m_ocean.shp"
ocean_gpd = geopandas.read_file(zipfile)

df = pd.DataFrame({
    'lon': [120.0,120.1,120.2,120.3,120.4],
    'lat': [10.0,10.1,10.2,10.3,10.4]
})

for index, row in df.iterrows():
    df.loc[index,'is_ocean'] = ocean_gpd.contains(Point(x['lon'],x['lat'])

but it is too slow, I tried to used lambda function like this

df = df.assign(is_ocean = lambda x: ocean_gpd.contains(Point(x['lon'],x['lat']))

but failed, the error is cannot convert the series to <class 'float'>

Is anyone know how to do better individual point checking like this in geopandas?

Note: I just realized that for polygon data I used 10m one (more detailed polygon), if I uses 110m it a lot better, but in the future maybe I need to use 10m


Solution

  • You can use apply like this:

    import geopandas
    import pandas as pd
    from shapely.geometry import Point
    
    ocean_gpd = geopandas.read_file('ne_10m_ocean.shp')
    
    df = pd.DataFrame({
        'lon': [120.0, 120.1, 120.2, 120.3, 120.4],
        'lat': [10.0, 10.1, 10.2, 10.3, 10.4]
    })
    
    def in_ocean(row):
        point = Point(row['lon'], row['lat'])
        return ocean_gpd.contains(point).any()
    
    df['is_ocean'] = df.apply(in_ocean, axis=1)
    
    
    

    which returns:

         lon   lat  is_ocean
    0  120.0  10.0     False
    1  120.1  10.1     False
    2  120.2  10.2     False
    3  120.3  10.3     False
    4  120.4  10.4     False