pythonpandastimestamp

time stamp - how to calculate time difference in seconds with a groupby


I have a pandas dataframe with id and date as the 2 columns - the date column has all the way to seconds.

data = {'id':[17,17,17,17,17,18,18,18,18],'date':['2018-01-16','2018-01-26','2018-01-27','2018-02-11',
                                                  '2018-03-14','2018-01-28','2018-02-12','2018-02-25','2018-03-04'],
       }

df1 = pd.DataFrame(data)

I would like to have a new column - (tslt) - 'time_since_last_transaction'. The first transaction for each unique user_id could be a number say 1. Each subsequent transaction for that user should measure the difference between the 1st time stamp for that user and its current time stamp to generate a time difference in seconds.

I used the datetime and timedelta etc. but did not have too much of luck. Any help would be appreciated.


Solution

  • You can try groupby().transform():

    df1['date'] = pd.to_datetime(df1['date'])
    
    df1['diff'] = df1['date'].sub(df1.groupby('id').date.transform('min')).dt.total_seconds()
    

    Output:

       id       date       diff
    0  17 2018-01-16        0.0
    1  17 2018-01-26   864000.0
    2  17 2018-01-27   950400.0
    3  17 2018-02-11  2246400.0
    4  17 2018-03-14  4924800.0
    5  18 2018-01-28        0.0
    6  18 2018-02-12  1296000.0
    7  18 2018-02-25  2419200.0
    8  18 2018-03-04  3024000.0