pythonpandasgroup-byeuclidean-distance

Average distance within group in pandas


I have a dataframe like this

df = pd.DataFrame({
    'id': ['A','A','B','B','B'],
    'x': [1,1,2,2,3],
    'y': [1,2,2,3,3]
})

enter image description here

The output I want is the average distance for each point in the group, in this example

group A: (distance((1,1),(1,2))) /1 = 1

group B: (distance((2,2),(2,3)) + distance((2,3),(3,3)) + distance((2,2),(3,3))) /3 = 1.138

enter image description here

I can calculate the distance using np.linalg.norm but I confused to use it in pandas groupby. Thank you

Note: 1 of my idea is I am trying to make this dataframe first (where I stuck), which is contains the pairs of point that I need to calculate the distance and after this I just need to calculate distance and groupby mean

enter image description here


Solution

  • A possible solution, based on numpy broadcasting:

    def calc_avg_distance(group):
        x = group[['x', 'y']].values
        dist_matrix = np.sqrt(((x - x[:, np.newaxis])**2).sum(axis=2))
        np.fill_diagonal(dist_matrix, np.nan)
        avg_distance = np.nanmean(dist_matrix)
        return avg_distance
    
    
    (df.groupby('id').apply(lambda x: calc_avg_distance(x))
     .reset_index(name='avg_distance'))
    

    Output:

     id  avg_distance
    0  A      1.000000
    1  B      1.138071