knnnearest-neighbordbscanhaversineepsilon

How determine optimal epsilon value in meters for DBSCAN by plotting KNN elbow


Before doing DBSCAN I need to find optimal epsilon value, all the points are geographical coordinates, I need the epsilon value in meters before convert it to radians to apply DBSCAN using haversine metrics

from sklearn.neighbors import NearestNeighbors
neigh = NearestNeighbors(n_neighbors=4)
nbrs = neigh.fit(firms[['y', 'x']])
distances, indices = nbrs.kneighbors(firms[['y', 'x']])

AND THEN

# Plotting K-distance Graph
distances = np.sort(distances, axis=0)
distances = distances[:,1]
plt.figure(figsize=(20,10))
plt.plot(distances)
plt.title('K-distance Graph',fontsize=20)
plt.xlabel('Data Points sorted by distance',fontsize=14)
plt.ylabel('Epsilon',fontsize=14)
plt.show()

and the graph output is this, but I need the epsilon value in meters.

enter image description here


Solution

  • I hope this helps to clarify, just a few observations:

    a) You are already finding the optimal epsilon value, using that method and from your figure eps = 0.005.

    b) If your points are geographic coordinates, you don't need the epsilon value in meters before converting only to then convert to radians so you can apply DBSCAN using haversine metrics, because from the geographic coordinates you can convert straight away to radians, and then you multiply by 6371000/1000 to get the result in kilometers, like this:

    from sklearn.metrics.pairwise import haversine_distances
    from math import radians
    bsas = [-34.83333, -58.5166646]
    paris = [49.0083899664, 2.53844117956]
    bsas_in_radians = [radians(_) for _ in bsas]
    paris_in_radians = [radians(_) for _ in paris]
    result = haversine_distances([bsas_in_radians, paris_in_radians])
    result * 6371000/1000  # multiply by Earth radius to get kilometers
    

    Code snippet from:

    https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.haversine_distances.html