I have a graph where each node has coordinates in 2D (it's actually a geographic graph, with latitude and longitude.) I need to verify that if the distance between two edges is less than MAX_DIST then they share a node. Of course, if they intersect, then the distance between them is zero.
The brute force algorithm is trivial, is there a more efficient algorithm?
I was thinking of trying to adapt https://en.wikipedia.org/wiki/Closest_pair_of_points_problem to graph edges (and ignoring pairs of edges with a shared node), but it is not trivial to do so.
I was curios to see how the rtree index idea would perform so I created a small script to test it using two really cool libraries for Python: Rtree and shapely
The snippet generates 1000 segments with 1 < length < 5 and coordinates in the [0, 100] interval, populates the index and then counts the pairs that are closer than MAX_DIST==0.1 (using the classic and the index-based method).
In my tests the index method was around 25x faster using the conditions above; this might vary greatly for your data set but the result is encouraging:
found 532 pairs of close segments using classic method
7.47 seconds for classic count
found 532 pairs of close segments using index method
0.28 seconds for index count
The performance and correctness of the index method depends on how your segments are distributed (how many are close, if you have very long segments, the parameters used).
import time
import random
from rtree import Rtree
from shapely.geometry import LineString
def generate_segments(number):
segments = {}
for i in range(number):
while True:
x1 = random.randint(0, 100)
y1 = random.randint(0, 100)
x2 = random.randint(0, 100)
y2 = random.randint(0, 100)
segment = LineString([(x1, y1), (x2, y2)])
if 1 < segment.length < 5: # only add relatively small segments
segments[i] = segment
break
return segments
def populate_index(segments):
idx = Rtree()
for index, segment in segments.items():
idx.add(index, segment.bounds)
return idx
def count_close_segments(segments, max_distance):
count = 0
for i in range(len(segments)-1):
s1 = segments[i]
for j in range(i+1, len(segments)):
s2 = segments[j]
if s1.distance(s2) < max_distance:
count += 1
return count
def count_close_segments_index(segments, idx, max_distance):
count = 0
for index, segment in segments.items():
close_indexes = idx.nearest(segment.bounds, 10)
for close_index in close_indexes:
if index >= close_index: # do not count duplicates
continue
close_segment = segments[close_index]
if segment.distance(close_segment) < max_distance:
count += 1
return count
if __name__ == "__main__":
MAX_DIST = 0.1
s = generate_segments(1000)
r_idx = populate_index(s)
t = time.time()
print("found %d pairs of close segments using classic method" % count_close_segments(s, MAX_DIST))
print("%.2f seconds for classic count" % (time.time() - t))
t = time.time()
print("found %d pairs of close segments using index method" % count_close_segments_index(s, r_idx, MAX_DIST))
print("%.2f seconds for index count" % (time.time() - t))