I have a list of n
2D points (x, y) and a list of m
rectangles (xmin, ymin, xmax, ymax). I want to find indices of all rectangles that enclose a point. This has to be done efficiently. GPU is also available. I'm looking at a solution that is heavily parallelised through vectorised operations. For loops are a no-go for me as 0 < n,m < 10^6
For example:
points = [[1, 1], [2, 2], [5, 5]]
rectangles = [[0, 0, 3, 3], [1, 1, 4, 4], [4, 4, 7, 7]]
result:
(1, 1): [(0, 0, 3, 3), (1, 1, 4, 4)]
(2, 2): [(0, 0, 3, 3), (1, 1, 4, 4)]
(5, 5): [(4, 4, 7, 7)]
I've written this solution in PyTorch (tried chunking to reduce memory overflow)
def find_rectangles_containing_points(points, rectangles, chunk_size=1000):
n = points.size(0)
m = rectangles.size(0)
rectangles_containing_points = []
for i in range(0, n, chunk_size):
points_chunk = points[i:i + chunk_size]
points_expanded = points_chunk.unsqueeze(1) # chunk_size x 1 x 2
rectangles_expanded = rectangles.unsqueeze(0) # 1 x m x 4
is_inside = (rectangles_expanded[:, :, :2] <= points_expanded) & (points_expanded <= rectangles_expanded[:, :, 2:])
result = torch.nonzero(is_inside.all(dim=-1))
rectangles_containing_points.extend([list(rect_ids) for rect_ids in result.t()])
return rectangles_containing_points
points = torch.rand(1000000, 2)
rectangles = torch.rand(10000, 4)
find_rectangles_containing_points(points, rectangles)
However, this solution doesn't scale to larger values of n and m (~10^6). Are there approaches designed for such specific problems?
I'd try to get to your problem the other way around by first assessing what points fall into a given rectangle. This operation is quite straightforward and can easily be vectorized:
import numpy as np
# points as numpy array and rectangle coordinates [x_min, y_min, x_max, y_max]
in_rectangle = np.logical_and(
np.logical_and(points[:, 0] >= x_min, points[:, 0] <= x_max),
np.logical_and(points[:, 1] >= y_min, points[:, 1] <= y_max)
)
You can then repeat this operation for your m rectangles.
Note that lists are not best for storing this type of result. On the other hand, an (m, n) array would likely not fit in memory if n and m ~ 1e6. That's why you could use a sparse matrix (that only stores non-zero values).
Below, S[i, j] = 1
if point j is in rectangle i.
import numpy as np
from scipy.sparse import dok_matrix
points = np.array([[1, 1], [2, 2], [3, 3], [4, 4], [5, 5]])
rectangles = np.array([[0, 0, 3, 3], [1, 1, 4, 4], [4, 4, 7, 7]])
S = dok_matrix((len(rectangles), len(points)), dtype=np.float32)
for i, coords in enumerate(rectangles):
x_min, y_min, x_max, y_max = coords
in_rectangle = np.logical_and(
np.logical_and(points[:, 0] >= x_min, points[:, 0] <= x_max),
np.logical_and(points[:, 1] >= y_min, points[:, 1] <= y_max)
)
S[i, np.nonzero(in_rectangle)] = 1
print(S.toarray())
Once the matrix is filled, S[:, j]
gives whether a rectangle i contains the point j (1 if so, no value if not). You may use the nonzero
function to directly get the rectangles indices.
I'm not very familiar with GPU computing but I think you could tweak this code to make it work with PyTorch or JAX instead of Numpy.