I have this table:
Date | User ID | Calls | Hours |
---|---|---|---|
3-9-2022 | 2234 | 10 | 01:06:19 |
3-9-2022 | 2234 | 4 | 00:32:54 |
That I want to group into (resetting index as well):
Date | User ID | Calls | Hours |
---|---|---|---|
3-9-2022 | 2234 | 14 | 01:39:13 |
I am getting stuck into aggregating the hours, I read that you have to convert it first into timedelta but I am not sure how to put the function into group by (perhaps with lambda?). In addition I have Calls applied with normal sum function. Here is my current code:
df.groupby(['Date','UserID'])['Calls','Hours'].apply(lambda x : x.astype(int).sum()).reset_index()
Data type:
Date - Datetime
UserID - Int
Calls - Int
Hours - Datetime (Time)
Any help appreciated!
ANSWER
df['Hours'] = pd.to_timedelta(df['Hours'])
df_group = df.groupby(['Date', 'User ID']).agg({'Calls':'sum', 'Hours':'sum'})
First convert column Hours
to timedeltas, aggregate sum
and then convert back to HH:MM:SS
format:
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
out = (df.assign(Hours = pd.to_timedelta(df['Hours']))
.groupby(['Date','User ID'], as_index=False)[['Calls','Hours']]
.sum()
.assign(Hours = lambda x: x['Hours'].apply(f)))
print (out)
Date User ID Calls Hours
0 3-9-2022 2234 14 01:39:13