I am using Shapely to intersect geographical data and Pyshp to write the results to a shapefile. I don't completely understand how "intersection" works.
First, I create some points and calculate the Voronoi polygons this way:
# (1)
from shapely import LineString, MultiPoint, Point, Polygon, MultiPolygon
from shapely.geometry import shape
from shapely.constructive import voronoi_polygons
from shapely import intersection
import shapefile
output_shapefile = "/tmp/simple_test_01.shp"
points = MultiPoint([Point(1,1), Point(2,2), Point(4,3), Point(2,4), Point(2,5), Point(6,5), Point(5,4), Point(7,1)])
myVoronoi = voronoi_polygons(points)
#>>> myVoronoi
#<GEOMETRYCOLLECTION (POLYGON ((-5 -5, -5 4.625, -4.5 4.5, 0 3, 4 -1, 4 -5, -...>
n = 0
with shapefile.Writer(output_shapefile, shapeType=5) as w:
w.field("ID", "N")
for geom in myVoronoi.geoms:
w.shape(geom)
w.record(n)
n += 1
When I draw the shapefile on ArcMap, I get this (green points highlighted by me, not in the shapefile):
I create a polygon with which I will intersect the Voronoi polygons:
# (2)
polygon_to_intersect = Polygon([Point(0,0), Point(2,8), Point(4,3.7), Point(10,4.5), Point(5,-4), Point(0,0)])
output_shapefile = "/tmp/simple_test_02.shp"
with shapefile.Writer(output_shapefile, shapeType=5) as w:
w.field("ID", "N")
w.shape(polygon_to_intersect)
w.record(1)
On ArcMap it is like this (shown with red outline):
I perform the intersection:
# (3)
intersected = intersection(myVoronoi, polygon_to_intersect)
>>> intersected
<POLYGON ((0.6 2.4, 0.75 3, 1.125 4.5, 2 8, 3.553 4.66, 3.739 4.261, 4 3.7, ...>
output_shapefile = "/tmp/simple_test_03.shp"
n = 0
with shapefile.Writer(output_shapefile) as w:
w.field("ID", "N")
w.shape(intersected)
w.record(n)
simple_test_03.shp is a single polygon. Its vertices can be seen here:
I was expecting "intersection" to give me the Voronoi polygons intersected with polygon_to_intersect. However, I am getting polygon_to_intersect with new vertices, in the places where the polygon's exterior intersects the Voronoi polygons.
>>> intersected
<POLYGON ((0.6 2.4, 0.75 3, 1.125 4.5, 2 8, 3.553 4.66, 3.739 4.261, 4 3.7, ...>
"intersected" is a Polygon instead of a MultiPolygon, why?
However, if I do this:
# (4)
output_shapefile = "/tmp/simple_test_04.shp"
multipol = []
for geom in myVoronoi.geoms:
multipol.append(intersection(geom, polygon_to_intersect))
n = 0
with shapefile.Writer(output_shapefile) as w:
w.field("ID", "N")
for pol in multipol:
w.shape(pol)
w.record(n)
n += 1
I get this result, which is what I wanted in the first place:
Each colored area is a Voronoi polygon intersected with "polygon_to_intersect".
Why do I need to do (4) to get what I want, when I was expecting (3) to work that way? Is it the fact that myVoronoi is a GEOMETRYCOLLECTION instead of a POLYGON or MULTIPOLYGON?
I tried to convert myVoronoi to a MultiPolygon prior to the intersection:
# (5)
output_shapefile = "/tmp/simple_test_05.shp"
myVoronoi2 = MultiPolygon(myVoronoi)
>>> myVoronoi2
<MULTIPOLYGON (((-5 -5, -5 4.625, -4.5 4.5, 0 3, 4 -1, 4 -5, -5 -5)), ((-5 1...>
intersected2 = intersection(myVoronoi2, polygon_to_intersect)
>>> intersected2
<POLYGON ((0.6 2.4, 0.75 3, 1.125 4.5, 2 8, 3.553 4.66, 3.739 4.261, 4 3.7, ...>
n = 0
with shapefile.Writer(output_shapefile) as w:
w.field("ID", "N")
w.shape(intersected2)
w.record(n)
>>> intersected2 == intersected
True
but the result is the same as in (3).
This is a simplified example. In my real situation, "myVoronoi" has 18000+ polygons and "polygon_to_intersect" is much bigger. (3) is not working as I want and (4) works but has performance issues.
[EDIT] This is my real situation. My Voronoi polygons: Some zoom in: Some more zoom:
And this is my "polygon_to_intersect" (outline in orange): With some zoom:
As mentioned by @Pieter, the intersection works with "union semantics", which means my Voronoi polygons get unioned away first, and then intersected with my polygon_to_intersect.
But that is not what I want. I want this (each Voronoi polygon in the result file; some of them intersected at the polygon_to_intersect boundaries):
So I used this code:
output_shapefile = 'intersected.shp'
n = 0
w = 0
i = 0
skipped = 0
with shapefile.Writer(output_shapefile) as writer:
writer.field("ID", "N")
for poly in myVoronoi:
n += 1
if not (n % 500):
print(f'Processed {n}, within = {w}, intersected = {i}, skipped = {skipped}')
# If the Voronoi polygon is fully inside polygon_to_intersect, don't waste time intersecting
# Just write the polygon into the output file
if within(poly, polygon_to_intersect):
writer.shape(poly)
writer.record(n)
w += 1
else:
# Perform the intersection
intersected = poly.intersection(polygon_to_intersect)
if intersected:
writer.shape(intersected)
writer.record(n + 50000)
i += 1
else:
skipped += 1
This works fine but it took almost 25 minutes to complete. There are 18.600+ polygons in myVoronoi and polygon_to_intersect has several hundreds of thousands of vertices.
Pieter's suggestion of using a Rtree didn't help much here in terms of performance.
tree = shapely.STRtree(voronoi_polys)
intersecting_idx = tree.query(polygon_to_intersect, predicate="intersects")
intersections = shapely.intersection(voronoi_polys.take(intersecting_idx), polygon_to_intersect)
Because it performs many more intersections than in my code above.
So I went and created a much simplified "polygon_to_intersect":
which has only 76 vertices.
And modified my code:
# Read the new simplified shapefile into memory
simplified_file = 'simplified.shp'
with shapefile.Reader(simplified_file, shapeType=5) as w:
for shapeRec in w.shapeRecords():
polygon_to_intersect_simplified = shape(shapeRec.shape)
prepare(polygon_to_intersect_simplified)
output_shapefile = 'intersected.shp'
n = 0
w = 0
i = 0
skipped = 0
with shapefile.Writer(output_shapefile) as writer:
writer.field("ID", "N")
for poly in myVoronoi:
n += 1
if not (n % 500):
print(f'Processed {n}, within = {w}, intersected = {i}, skipped = {skipped}')
# If the Voronoi polygon is fully inside the simplified shape, don't waste time intersecting
# Just write the polygon into the output file
if within(poly, polygon_to_intersect_simplified):
writer.shape(poly)
writer.record(n)
w += 1
else:
# Perform the intersection with the real (not simplified) polygon
intersected = poly.intersection(polygon_to_intersect)
if intersected:
writer.shape(intersected)
writer.record(n + 50000)
i += 1
else:
skipped += 1
"within" is computed 18.600+ times against the much simplified polygon, which is faster. Polygons within are not intersected, so time is saved there. Then, those polygons not "within", are intersected with the real polygon_to_intersect.
The modified code runs in under 4 minutes.
Without the simplified polygon:
With the simplified polygon:
When doing overlays with GeometryCollections the inner boundaries are (intentionally) first unioned away, so this result is to be expected (source).
It should be possible to solve the performance issue of the loop by using an rtree index, as shown in the following code sample:
import numpy as np
import shapely
from shapely import MultiPoint, Polygon
points = MultiPoint([(1,1), (2,2), (4,3), (2,4), (2,5), (6,5), (5,4), (7,1)])
myVoronoi = shapely.voronoi_polygons(points)
polygon_to_intersect = Polygon([(0,0), (2,8), (4,3.7), (10,4.5), (5,-4), (0,0)])
voronoi_polys = np.array(myVoronoi.geoms)
tree = shapely.STRtree(voronoi_polys)
intersecting_idx = tree.query(polygon_to_intersect, predicate="intersects")
intersections = shapely.intersection(voronoi_polys.take(intersecting_idx), polygon_to_intersect)
print(intersections)