pythonpandasdatetime

Calculate average daily amount of time between two news posted by a unique Source


I have a pandas dataframe (I have simplified table with one date showing in example), I want to calculate the average daily amount of time between two news posted by a unique Source

Input

source          date           time     
Investing.com   2017-05-11     08:00:00     
Investing.com   2017-05-11     12:00:00
Investing.com   2017-05-11     16:00:00 
yahoo.com       2017-05-11     09:00:00 
yahoo.com       2017-05-11     12:00:00
yahoo.com       2017-05-11     15:00:00
yahoo.com       2017-05-12     06:00:00 
yahoo.com       2017-05-12     12:00:00
yahoo.com       2017-05-12     18:00:00  

Desired_output

source          date           Average_Daily_time   
Investing.com   2017-05-11     04:00:00      
yahoo.com       2017-05-11     03:00:00
yahoo.com       2017-05-12     06:00:00 

My Attempt

I merged the datetime in one timestamp and called it datetime

df.sort_values('datetime').groupby('source')['datetime'].apply(lambda x: x.diff().dt.seconds.mean()/60)

Issue

It calculates average time for all dates combined, not separate dates. How to show average time for separate dates?


Solution

  • Convert the time column to timedelta, then group the dataframe by source and date and aggregate time using a lambda function to calculate the mean of diff between rows

    df['time'] = pd.to_timedelta(df['time'])
    (
        df.groupby(['source', 'date'])['time']
          .agg(lambda d: d.diff().mean()).reset_index(name='avg')
    )
    

              source        date             avg
    0  Investing.com  2017-05-11 0 days 04:00:00
    1      yahoo.com  2017-05-11 0 days 03:00:00
    2      yahoo.com  2017-05-12 0 days 06:00:00