What I want to do is to calculate and club those lat_longs into a single lat_long that have a haversine distance smaller than 1km and push them into a list and also those lat_longs that do not have distance smaller than 1km.
I have used haversine to calculate haversine distance distance.
def get_dist(loc_1,loc_2):
loc_1 = loc_1.split(",")
loc_2 = loc_2.split(",")
loc_1 = (float(loc_1[0]),float(loc_1[1]))
loc_2 = (float(loc_2[0]),float(loc_2[1]))
val = hs.haversine(loc_1,loc_2)
return val
So basically my aim was to cluster the geo spatial locations to find out Natural-Gas pumps in the database.
I used DBSCAN for this.
Code:-
final_df[['latitude','longitude']] =
final_df['start_cord'].str.split(",",expand=True)
print(len(final_df))
del final_df['start_cord']
final_df['latitude'] = pd.to_numeric(final_df['latitude'])
final_df['longitude'] = pd.to_numeric(final_df['longitude'])
final_df = final_df.reset_index(drop=True)
coords = final_df.to_numpy()
kms_per_radian = 6371.0088
epsilon = 0.3 / kms_per_radian
db = DBSCAN(eps=epsilon, min_samples=10,
algorithm='ball_tree', metric='haversine').fit(np.radians(coords))
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))
Imports
import pandas as pd, numpy as np, matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint
import haversine as hs