pythongeospatialcomputational-geometrygeopandasshapely

Merging Geodataframe Polygons to Meet Population Threshold in Python


I have a geodataframe in Python containing polygons representing various regions, with each region having its respective population count. My goal is to merge these polygons in a way that ensures each resulting polygon contains at least a specified population threshold (x).

Here's a brief overview of the problem:

Input Data: A geodataframe where each row represents a polygon (region) with its associated population count.

Goal: Merge neighboring polygons recursively until all of the resulting polygons contain a minimum population threshold 'x'. Ideally while keeping the highest possible granularity.

Now, I'm looking for recommendations on algorithms (or frameworks) that can efficiently handle this type of spatial operation. Ideally, the solution should be scalable for large datasets. I have tried to implement it myself but keep running into various issues e.g. having to recalculate the neighbors after each iteration or updating the geometries of the resulting polygons etc. I could not find anything helpful by googling either.

Some specific questions I have:

  1. Are there any established algorithms commonly used for this type of spatial merging based on population thresholds?
  2. Are there any Python libraries or frameworks that solve this task?
  3. Are there any best practices or considerations I should keep in mind while implementing this solution? I'm aware that vectorization and spatial indexing can help speed up the algorithm.

Any guidance, suggestions, or code snippets would be greatly appreciated! Thanks in advance for your help.


Solution

  • I think the "Max-P Regionalization" algorithm (e.g. implemented in pysal/spopt/maxp) might be a good solution:

    The max-p problem involves the clustering of a set of geographic areas into the maximum number of homogeneous regions such that the value of a spatially extensive regional attribute is above a predefined threshold value. The spatially extensive attribute can be specified to ensure that each region contains sufficient population size, or a minimum number of enumeration units. The number of regions is endogenous to the problem and is useful for regionalization problems where the analyst does not require a fixed number of regions a-priori.