pythongeopandasshapefile

Geopandas fails reading geojson (SOLVED)


I'm trying to read a geojson I created using these steps

import geopandas as gpd

vec_data = gpd.read_file("map.shp") 
vec_data.head()
vec_data['LPIS_name'].unique()
sel_crop = vec_data[vec_data.LPIS_name == 'Permanent Grassland']
sel_crop.to_file("Permanent_Grassland.geojson", driver='GeoJSON')
feature = gpd.read_file("Permanent_Grassland.geojson")

but I'm getting the following error:

{       "name": "DataSourceError",
        "message": "Failed to read GeoJSON data",
        "stack": "---------------------------------------------------------------------------
    DataSourceError                           Traceback (most recent call last)
    Cell In[8], line 1
    ----> 1 feature = gpd.read_file(path_feature)
    
    File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\geopandas\\io\\file.py:294, in _read_file(filename, bbox, mask, columns, rows, engine, **kwargs)
        291             from_bytes = True
        293 if engine == \"pyogrio\":
    --> 294     return _read_file_pyogrio(
        295         filename, bbox=bbox, mask=mask, columns=columns, rows=rows, **kwargs
        296     )
        298 elif engine == \"fiona\":
        299     if pd.api.types.is_file_like(filename):
    
    File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\geopandas\\io\\file.py:547, in _read_file_pyogrio(path_or_bytes, bbox, mask, rows, **kwargs)
        538     warnings.warn(
        539         \"The 'include_fields' and 'ignore_fields' keywords are deprecated, and \"
        540         \"will be removed in a future release. You can use the 'columns' keyword \"
       (...)
        543         stacklevel=3,
        544     )
        545     kwargs[\"columns\"] = kwargs.pop(\"include_fields\")
    --> 547 return pyogrio.read_dataframe(path_or_bytes, bbox=bbox, **kwargs)
    
    File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\pyogrio\\geopandas.py:261, in read_dataframe(path_or_buffer, layer, encoding, columns, read_geometry, force_2d, skip_features, max_features, where, bbox, mask, fids, sql, sql_dialect, fid_as_index, use_arrow, on_invalid, arrow_to_pandas_kwargs, **kwargs)
        256 if not use_arrow:
        257     # For arrow, datetimes are read as is.
        258     # For numpy IO, datetimes are read as string values to preserve timezone info
        259     # as numpy does not directly support timezones.
        260     kwargs[\"datetime_as_string\"] = True
    --> 261 result = read_func(
        262     path_or_buffer,
        263     layer=layer,
        264     encoding=encoding,
        265     columns=columns,
        266     read_geometry=read_geometry,
        267     force_2d=gdal_force_2d,
        268     skip_features=skip_features,
        269     max_features=max_features,
        270     where=where,
        271     bbox=bbox,
        272     mask=mask,
        273     fids=fids,
        274     sql=sql,
        275     sql_dialect=sql_dialect,
        276     return_fids=fid_as_index,
        277     **kwargs,
        278 )
        280 if use_arrow:
        281     meta, table = result
    
    File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\pyogrio\\raw.py:196, in read(path_or_buffer, layer, encoding, columns, read_geometry, force_2d, skip_features, max_features, where, bbox, mask, fids, sql, sql_dialect, return_fids, datetime_as_string, **kwargs)
         56 \"\"\"Read OGR data source into numpy arrays.
         57 
         58 IMPORTANT: non-linear geometry types (e.g., MultiSurface) are converted
       (...)
        191 
        192 \"\"\"
        194 dataset_kwargs = _preprocess_options_key_value(kwargs) if kwargs else {}
    --> 196 return ogr_read(
        197     get_vsi_path_or_buffer(path_or_buffer),
        198     layer=layer,
        199     encoding=encoding,
        200     columns=columns,
        201     read_geometry=read_geometry,
        202     force_2d=force_2d,
        203     skip_features=skip_features,
        204     max_features=max_features or 0,
        205     where=where,
        206     bbox=bbox,
        207     mask=_mask_to_wkb(mask),
        208     fids=fids,
        209     sql=sql,
        210     sql_dialect=sql_dialect,
        211     return_fids=return_fids,
        212     dataset_kwargs=dataset_kwargs,
        213     datetime_as_string=datetime_as_string,
        214 )
    
    File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\pyogrio\\_io.pyx:1239, in pyogrio._io.ogr_read()
    
    File c:\\Users\\bventura\\AppData\\Local\\anaconda3\\Lib\\site-packages\\pyogrio\\_io.pyx:219, in pyogrio._io.ogr_open()
    
    DataSourceError: Failed to read GeoJSON data"

    }

As requested, please here you can download the Geojson for a better debug of the code

In the meanwhile I tried to search online and it seems that a potential error could be the following: Polygons and MultiPolygons should follow the right-hand rule


Solution

  • Under the hood, geopandas uses the pyogrio library to read/write files, and pyogrio on its turn uses the gdal library.

    So, I had a look if there is some more detail in the error messaging when the gdal python bindings are used, and this is the case.

    When you run the following script:

    from osgeo import gdal
    
    gdal.UseExceptions()
    path = "C:\Temp\gras\Permanent_Grassland.geojson"
    gdal.VectorTranslate(srcDS=str(path), destNameOrDestDS="C:/Temp/gras/Permanent_Grassland.gpkg")
    

    It outputs the following error:

    RuntimeError: Failed to read GeoJSON data
    May be caused by: At line 6, character 51158626: GeoJSON object too 
    complex/large. You may define the OGR_GEOJSON_MAX_OBJ_SIZE configuration
    option to a value in megabytes to allow for larger features, or 0 to
    remove any size limit
    

    So apparently one of the geometries is pretty huge... But, as indicated, there is a solution: specify that huge features are allowed.

    Sample script that allows unlimited sized features, avoiding the error. I'm using pyogrio to set the configuration option to avoid you having to install the gdal python bindings, as they that are not installed by default with geopandas and can be more difficult to install if you would be using plain pip:

    import geopandas as gpd
    import pyogrio
    
    path = "C:\Temp\gras\Permanent_Grassland.geojson"
    pyogrio.set_gdal_config_options({"OGR_GEOJSON_MAX_OBJ_SIZE": 0})
    gdf = gpd.read_file(path)
    print(gdf)
    

    Apparently your geojson file consists of only a single huge multipolygon:

         ID       DESCR_IT  ...            LPIS_name                                           
    geometry
    0  3097  Prato stabile  ...  Permanent Grassland  MULTIPOLYGON (((10.51872 
    46.69302, 10.51878 46... 
    
    [1 rows x 8 columns]
    

    FYI: I opened an issue in the pyogrio issue tracker to check if it is possible to also show this detailed error when reading the file via pyogrio/geopandas: https://github.com/geopandas/pyogrio/issues/491