pandasgroup-bygreat-circle

How to group by trip id and find the straight distance traveled?


I have the following data :

Trip      Start_Lat   Start_Long    End_lat      End_Long    Starting_point    Ending_point
Trip_1    56.5624     -85.56845       58.568       45.568         A               B
Trip_1    58.568       45.568       -200.568     -290.568         B               C 
Trip_1   -200.568     -290.568       56.5624     -85.56845        C               D
Trip_2    56.5624     -85.56845     -85.56845    -200.568         A               B
Trip_2   -85.56845    -200.568      -150.568     -190.568         B               C

I would like to find the circuitry which is

   Circuity = Total Distance Travelled(Trip A+B+C+D) - Straight line (Trip A to D)
              -----------------------------------------------------------------------
                       Total Distance Traveled (Trip A+B+C+D)

I tried the following code,

    df['Distance']= df['flight_distance'] = df.apply(lambda x: great_circle((x['start_lat'], x['start_long']), (x['end_lat'], x['end_long'])).km, axis = 1) 
    df['Total_Distance'] = ((df.groupby('Trip')['distance'].shift(2) +['distance'].shift(1) + df['distance']).abs())

Could you help me to find the straight line distance and circuitry?


Solution

  • UPDATE:

    you may want to convert your values to numeric dtypes first:

    df[['Start_Lat','Start_Long','End_lat','End_Long']] = \
    df[['Start_Lat','Start_Long','End_lat','End_Long']].apply(pd.to_numeric, errors='coerce')
    

    IIUC you can do it this way:

    # vectorized haversine function
    def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
        """
        slightly modified version: of http://stackoverflow.com/a/29546836/2901002
    
        Calculate the great circle distance between two points
        on the earth (specified in decimal degrees or in radians)
    
        All (lat, lon) coordinates must have numeric dtypes and be of equal length.
    
        """
        if to_radians:
            lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
    
        a = np.sin((lat2-lat1)/2.0)**2 + \
            np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
    
        return earth_radius * 2 * np.arcsin(np.sqrt(a))
    
    def f(df):
        return 1 - haversine(df.iloc[0, 1], df.iloc[0, 2],
                             df.iloc[-1, 3], df.iloc[-1, 4]) \
                   / \
                   haversine(df['Start_Lat'], df['Start_Long'],
                             df['End_lat'], df['End_Long']).sum()
    
    df.groupby('Trip').apply(f)
    

    Result:

    In [120]: df.groupby('Trip').apply(f)
    Out[120]:
    Trip
    Trip_1    1.000000
    Trip_2    0.499825
    dtype: float64