I'm trying to read a geojson I created using these steps
import geopandas as gpd
vec_data = gpd.read_file("map.shp")
vec_data.head()
vec_data['LPIS_name'].unique()
sel_crop = vec_data[vec_data.LPIS_name == 'Permanent Grassland']
sel_crop.to_file("Permanent_Grassland.geojson", driver='GeoJSON')
feature = gpd.read_file("Permanent_Grassland.geojson")
but I'm getting the following error:
{ "name": "DataSourceError",
"message": "Failed to read GeoJSON data",
"stack": "---------------------------------------------------------------------------
DataSourceError Traceback (most recent call last)
Cell In[8], line 1
----> 1 feature = gpd.read_file(path_feature)
File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\geopandas\\io\\file.py:294, in _read_file(filename, bbox, mask, columns, rows, engine, **kwargs)
291 from_bytes = True
293 if engine == \"pyogrio\":
--> 294 return _read_file_pyogrio(
295 filename, bbox=bbox, mask=mask, columns=columns, rows=rows, **kwargs
296 )
298 elif engine == \"fiona\":
299 if pd.api.types.is_file_like(filename):
File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\geopandas\\io\\file.py:547, in _read_file_pyogrio(path_or_bytes, bbox, mask, rows, **kwargs)
538 warnings.warn(
539 \"The 'include_fields' and 'ignore_fields' keywords are deprecated, and \"
540 \"will be removed in a future release. You can use the 'columns' keyword \"
(...)
543 stacklevel=3,
544 )
545 kwargs[\"columns\"] = kwargs.pop(\"include_fields\")
--> 547 return pyogrio.read_dataframe(path_or_bytes, bbox=bbox, **kwargs)
File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\pyogrio\\geopandas.py:261, in read_dataframe(path_or_buffer, layer, encoding, columns, read_geometry, force_2d, skip_features, max_features, where, bbox, mask, fids, sql, sql_dialect, fid_as_index, use_arrow, on_invalid, arrow_to_pandas_kwargs, **kwargs)
256 if not use_arrow:
257 # For arrow, datetimes are read as is.
258 # For numpy IO, datetimes are read as string values to preserve timezone info
259 # as numpy does not directly support timezones.
260 kwargs[\"datetime_as_string\"] = True
--> 261 result = read_func(
262 path_or_buffer,
263 layer=layer,
264 encoding=encoding,
265 columns=columns,
266 read_geometry=read_geometry,
267 force_2d=gdal_force_2d,
268 skip_features=skip_features,
269 max_features=max_features,
270 where=where,
271 bbox=bbox,
272 mask=mask,
273 fids=fids,
274 sql=sql,
275 sql_dialect=sql_dialect,
276 return_fids=fid_as_index,
277 **kwargs,
278 )
280 if use_arrow:
281 meta, table = result
File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\pyogrio\\raw.py:196, in read(path_or_buffer, layer, encoding, columns, read_geometry, force_2d, skip_features, max_features, where, bbox, mask, fids, sql, sql_dialect, return_fids, datetime_as_string, **kwargs)
56 \"\"\"Read OGR data source into numpy arrays.
57
58 IMPORTANT: non-linear geometry types (e.g., MultiSurface) are converted
(...)
191
192 \"\"\"
194 dataset_kwargs = _preprocess_options_key_value(kwargs) if kwargs else {}
--> 196 return ogr_read(
197 get_vsi_path_or_buffer(path_or_buffer),
198 layer=layer,
199 encoding=encoding,
200 columns=columns,
201 read_geometry=read_geometry,
202 force_2d=force_2d,
203 skip_features=skip_features,
204 max_features=max_features or 0,
205 where=where,
206 bbox=bbox,
207 mask=_mask_to_wkb(mask),
208 fids=fids,
209 sql=sql,
210 sql_dialect=sql_dialect,
211 return_fids=return_fids,
212 dataset_kwargs=dataset_kwargs,
213 datetime_as_string=datetime_as_string,
214 )
File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\pyogrio\\_io.pyx:1239, in pyogrio._io.ogr_read()
File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\pyogrio\\_io.pyx:219, in pyogrio._io.ogr_open()
DataSourceError: Failed to read GeoJSON data"
}
As requested, please here you can download the Geojson for a better debug of the code
In the meanwhile I tried to search online and it seems that a potential error could be the following: Polygons and MultiPolygons should follow the right-hand rule
Under the hood, geopandas
uses the pyogrio
library to read/write files, and pyogrio
on its turn uses the gdal
library.
So, I had a look if there is some more detail in the error messaging when the gdal
python bindings are used, and this is the case.
When you run the following script:
from osgeo import gdal
gdal.UseExceptions()
path = "C:\Temp\gras\Permanent_Grassland.geojson"
gdal.VectorTranslate(srcDS=str(path), destNameOrDestDS="C:/Temp/gras/Permanent_Grassland.gpkg")
It outputs the following error:
RuntimeError: Failed to read GeoJSON data
May be caused by: At line 6, character 51158626: GeoJSON object too
complex/large. You may define the OGR_GEOJSON_MAX_OBJ_SIZE configuration
option to a value in megabytes to allow for larger features, or 0 to
remove any size limit
So apparently one of the geometries is pretty huge... But, as indicated, there is a solution: specify that huge features are allowed.
Sample script that allows unlimited sized features, avoiding the error. I'm using pyogrio
to set the configuration option to avoid you having to install the gdal python bindings, as they that are not installed by default with geopandas
and can be more difficult to install if you would be using plain pip
:
import geopandas as gpd
import pyogrio
path = "C:\Temp\gras\Permanent_Grassland.geojson"
pyogrio.set_gdal_config_options({"OGR_GEOJSON_MAX_OBJ_SIZE": 0})
gdf = gpd.read_file(path)
print(gdf)
Apparently your geojson file consists of only a single huge multipolygon:
ID DESCR_IT ... LPIS_name
geometry
0 3097 Prato stabile ... Permanent Grassland MULTIPOLYGON (((10.51872
46.69302, 10.51878 46...
[1 rows x 8 columns]
FYI: I opened an issue in the pyogrio
issue tracker to check if it is possible to also show this detailed error when reading the file via pyogrio/geopandas: https://github.com/geopandas/pyogrio/issues/491