pythondataframegeospatialhaversine

Is there a way to calculate nearest lat_longs and then club them together in python?


Data sheet example coordinates

What I want to do is to calculate and club those lat_longs into a single lat_long that have a haversine distance smaller than 1km and push them into a list and also those lat_longs that do not have distance smaller than 1km.

I have used haversine to calculate haversine distance distance.

def get_dist(loc_1,loc_2):
    
    loc_1 = loc_1.split(",")
    loc_2 = loc_2.split(",")
    
    loc_1 = (float(loc_1[0]),float(loc_1[1]))
    loc_2 = (float(loc_2[0]),float(loc_2[1]))
    
    val = hs.haversine(loc_1,loc_2)
    
    return val

Solution

  • So basically my aim was to cluster the geo spatial locations to find out Natural-Gas pumps in the database.

    I used DBSCAN for this.

    Code:-

        final_df[['latitude','longitude']] = 
        final_df['start_cord'].str.split(",",expand=True)
        print(len(final_df))
        
        del final_df['start_cord']
        
        final_df['latitude'] = pd.to_numeric(final_df['latitude'])
        final_df['longitude'] = pd.to_numeric(final_df['longitude'])
        final_df = final_df.reset_index(drop=True)
        
        coords = final_df.to_numpy()
        
        kms_per_radian = 6371.0088
        epsilon = 0.3 / kms_per_radian
        db = DBSCAN(eps=epsilon, min_samples=10,
                    algorithm='ball_tree', metric='haversine').fit(np.radians(coords))
        cluster_labels = db.labels_
        num_clusters = len(set(cluster_labels))
        clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
        print('Number of clusters: {}'.format(num_clusters))
    

    Imports

    import pandas as pd, numpy as np, matplotlib.pyplot as plt
    from sklearn.cluster import DBSCAN
    from geopy.distance import great_circle
    from shapely.geometry import MultiPoint
    import haversine as hs