pythonpandaseuclidean-distancekdtreescipy-spatial

Calculate Distance to the nearest object


I need to make a map of distances to the nearest object. I have a solution where i am looping over every point of a map, and every object, calculating the distance to all of them, and then leaving only minimum distance. The problem here is that if I am woking with real data, the map can easily contain 10s of millions of points, and there can be more than 100 objects. Is there any better code implementation for solving this problem?

Loading packages

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Generate synthetic map

coord_dict = {"X": [],
              "Y": []}

for x_value in range(0, 10000, 50):    
    for y_value in range(0, 5000, 50):
        coord_dict["X"].append(x_value)
        coord_dict["Y"].append(y_value)

map_df = pd.DataFrame(coord_dict)

Generate points to calculate distance from

well_points_dict = {"X": [500, 1500, 4000, 5500, 6250, 7500, 8000, 9000], 
                    "Y": [500, 4000, 2000, 1500, 500, 5000, 100, 2500]}

wells_df = pd.DataFrame(well_points_dict)

Calculate distances

calculations_count = 0
distance_map = np.zeros(map_df.shape)

for i in range(map_df.shape[0]):
    d = []
    for j in range(wells_df.shape[0]):
        d.append(((map_df["X"].iloc[i]-wells_df["X"][j])**2 + (map_df["Y"].iloc[i]-      wells_df["Y"][j])**2)**0.5)
        calculations_count += 1
    dd = min(d)
    distance_map[i,1] = dd
    # print(calculations_count)

Print resulting map

plt.figure(figsize=(10,10))
plt.scatter(x=map_df["X"],y=map_df["Y"],c=distance_map[:,1],s=1,cmap='terrain')
for i in range(len(wells_df)):
    plt.plot(wells_df["X"][i],wells_df["Y"][i], color='black', marker='o',markersize=3)
plt.title('Calculated map')
plt.xlabel('X')
plt.ylabel('Y')
plt.axis('scaled')
plt.tight_layout()
plt.colorbar(shrink=0.25)

Result map example:

Result map example


Solution

  • KDTree is what you are looking for, there is an implementation of it in scipy.spatial.

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy import spatial
    

    Given your trial points:

    x_value = np.arange(0, 10000, 50)  
    y_value = np.arange(0, 5000, 50)
    X, Y = np.meshgrid(x_value, y_value)
    points = np.stack([X.ravel(), Y.ravel()]).T
    

    And well points:

    x_well = np.array([500, 1500, 4000, 5500, 6250, 7500, 8000, 9000])
    y_well = np.array([500, 4000, 2000, 1500, 500, 5000, 100, 2500])
    wells = np.stack([x_well, y_well]).T
    

    We can create a KDTree:

    interpolator = spatial.KDTree(wells)
    

    And query efficiently the tree to get distances and also indices of which point it is closer:

    distances, indices = interpolator.query(points)
    # 7.12 ms ± 711 µs per loop (mean ± std. dev. of 30 runs, 100 loops each)
    

    Plotting the result leads to:

    fig, axe = plt.subplots()
    axe.scatter(*points.T, marker=".", c=distances)
    axe.scatter(*wells.T, color="black")
    axe.grid()
    

    enter image description here

    We see the Voronoi diagram appearing on the color map which is a good confirmation distances are correctly interpreted wrt reference points (wells):

    voronoi = spatial.Voronoi(wells)
    # ...
    spatial.voronoi_plot_2d(voronoi, ax=axe)
    

    enter image description here

    Where the object Voronoi is the Voronoi diagram based on your reference points (wells) and voronoi_plot_2d an helper to draw it on axes.