I have ocean geopandas which contains 1 multipolygon (source: naturalearthdata.com)
I also have another dataframe that contains at lot of longitude and latitude information
I want to add a new column that will be True if the Point is in the ocean (inside the multipolygon)
zipfile = "ne_10m_ocean/ne_10m_ocean.shp"
ocean_gpd = geopandas.read_file(zipfile)
df = pd.DataFrame({
'lon': [120.0,120.1,120.2,120.3,120.4],
'lat': [10.0,10.1,10.2,10.3,10.4]
})
for index, row in df.iterrows():
df.loc[index,'is_ocean'] = ocean_gpd.contains(Point(x['lon'],x['lat'])
but it is too slow, I tried to used lambda function like this
df = df.assign(is_ocean = lambda x: ocean_gpd.contains(Point(x['lon'],x['lat']))
but failed, the error is cannot convert the series to <class 'float'>
Is anyone know how to do better individual point checking like this in geopandas?
Note: I just realized that for polygon data I used 10m one (more detailed polygon), if I uses 110m it a lot better, but in the future maybe I need to use 10m
You can use apply
like this:
import geopandas
import pandas as pd
from shapely.geometry import Point
ocean_gpd = geopandas.read_file('ne_10m_ocean.shp')
df = pd.DataFrame({
'lon': [120.0, 120.1, 120.2, 120.3, 120.4],
'lat': [10.0, 10.1, 10.2, 10.3, 10.4]
})
def in_ocean(row):
point = Point(row['lon'], row['lat'])
return ocean_gpd.contains(point).any()
df['is_ocean'] = df.apply(in_ocean, axis=1)
which returns:
lon lat is_ocean
0 120.0 10.0 False
1 120.1 10.1 False
2 120.2 10.2 False
3 120.3 10.3 False
4 120.4 10.4 False