pythonpython-3.xgeopandas

Why does geopandas dissolve function keep working forever?


All, I am trying to use the Geopandas dissolve function to aggregate a few countries; the function countries.dissolve keeps running forever. Here is a minimal script.

import geopandas as gpd

shape='/Volumes/TwoGb/shape/fwdshapfileoftheworld/'
countries=gpd.read_file(shape+'TM_WORLD_BORDERS-0.3.shp')


# Add columns
countries['wmosubregion'] = ''
countries['dummy'] = ''

country_count = len(countries)

# If the country list is empty then use all countries.
country_list=['SO','SD','DJ','KM']
default = 'Null'
for i in range(country_count):
     countries.at[i, 'wmosubregion'] = default
     if countries.ISO2[i] in country_list:
         countries.at[i, 'wmosubregion'] = "EAST_AFRICA"
         print(countries.ISO2[i])
         
         
region_shapes = countries.dissolve(by='wmosubregion')

I am using the TM_WORLD_BORDERS-0.3 shape files, which is freely accessible. You can get the shape files (TM_WORLD_BORDERS-0.3.shp, TM_WORLD_BORDERS-0.3.dbf, TM_WORLD_BORDERS-0.3.shx, TM_WORLD_BORDERS-0.3.shp ) from the following GitHub https://github.com/rmichnovicz/Sick-Slopes/tree/master

Thanks


Solution

  • Dissolve is working when I try it, it finishes in a few seconds. My Geopandas version is 1.0.1.

    import geopandas as gpd
    df = gpd.read_file(r"C:\Users\bera\Downloads\TM_WORLD_BORDERS-0.3.shp")
    df.plot(column="NAME")
    

    enter image description here

    df2 = df.dissolve()
    df2.plot()
    

    enter image description here

    There are some invalid geometries that might cause problems for you? Try fixing them:

    #df.geometry.is_valid.all()
    #np.False_
    
    #Four geometries are invalid
    df.loc[~df.geometry.is_valid]
    #     FIPS ISO2  ...     LAT                                           geometry
    # 23    CA   CA  ...  59.081  MULTIPOLYGON (((-65.61362 43.42027, -65.61972 ...
    # 32    CI   CL  ... -23.389  MULTIPOLYGON (((-67.21278 -55.89362, -67.24695...
    # 154   NO   NO  ...  61.152  MULTIPOLYGON (((8.74361 58.40972, 8.73194 58.4...
    # 174   RS   RU  ...  61.988  MULTIPOLYGON (((131.87329 42.95694, 131.82413 ...
    # [4 rows x 12 columns]
    
    
    df.geometry = df.geometry.make_valid()
    #df.geometry.is_valid.all()
    #np.True_