Merging one kml with multiple polygons with another kml with multiple polygons that sometimes overlap

I am trying to merge two kmls with multiple polygons in a map area so I can after estimate the population inside the merged area. Polygons in one kml sometimes overlap completely or partially with ones in the other kml. I have tried using geopandas and the unary union function, but instead of just merging polygons that are overlapping polygons in the other kml, it sometimes merges separate polygons creating a larger area which I do not want. I want the area covered by both kmls to be the exact same as covered in the merged kml, just whatever was overlapping joins to form one polygon.

Here is what I've tried so far and what it results in:

from shapely.geometry import shape
import geopandas as gpd
import matplotlib.pyplot as plt

polygons = []
kml_files = ["merged_boundaries_Montreal2_2024-06-23_5PTime_5.0PTime_transit.kml", "merged_boundaries_MTL_2_45bit_5.0PTime.kml"]
# Read KML files and extract polygons
for kml_file_path in kml_files:
    gdf = gpd.read_file(kml_file_path)
    for geom in gdf.geometry:
        polygons.append(shape(geom))



mergedPolys = unary_union(polygons)

gpd.GeoSeries([mergedPolys]).boundary.plot()
plt.show()

output_file = "merged_boundaries_Montreal2_2024-08-25-TandB.kml"    
gdf = gpd.GeoDataFrame(geometry=[mergedPolys])

 # Write the GeoDataFrame to a KML file
gdf.to_file(output_file, driver='KML')type here

The two kmls I want to merge are the blue and green, and the merged kml with the code above is grey. I want the grey kml to cover exactly the blue and green regions, but it covers other unwanted regions: enter image description here

Solution

You are not filtering out the overlapping polygons in the input files, so all polygons in both input files are just all merged together.

The typical way to filter overlapping geometries using geopandas is with geopandas.sjoin and the typical way to merge polygons in a dataframe is with geopandas.dissolve.

Sample script:

import geopandas as gpd
from matplotlib import pyplot as plt
import pandas as pd
from shapely import box


# Normally... read kml files, but for this sample script, create sample data.
# file1_gdf = gpd.read_file("test1.kml", driver="kml")
file1_gdf = gpd.GeoDataFrame(geometry=[box(0, 0, 10, 10), box(20, 0, 30, 10)])
# file2_gdf = gpd.read_file("test2.kml", driver="kml")
file2_gdf = gpd.GeoDataFrame(geometry=[box(5, 5, 15, 15), box(20, 15, 25, 20)])

# Find overlapping polygons, in both directions.
file1_overlapping_gdf = file1_gdf.sjoin(file2_gdf, how="inner")
file2_overlapping_gdf = file2_gdf.sjoin(file1_gdf, how="inner")

# Concat both results and merge/dissolve the overlapping polygons.
overlapping_gdf = pd.concat([file1_overlapping_gdf, file2_overlapping_gdf])
dissolved_overlapping_gdf = overlapping_gdf.dissolve()

# Plot input and result.
ax = file1_gdf.plot(color="blue", alpha=0.2)
file2_gdf.plot(color="green", ax=ax, alpha=0.2)
dissolved_overlapping_gdf.plot(ax=ax, facecolor="none", edgecolor="red", hatch="/")
plt.show()

Input (blue and green polygons) and result (red hatched):