I have a df ,you can have it by copy and run the following code:
import pandas as pd
from io import StringIO
df = """
b_id duration1 duration2 user
366 NaN 38 days 22:05:06.807430 Test
367 0 days 00:00:05.285239 NaN Test
368 NaN NaN Test
371 NaN NaN Test
378 NaN 451 days 14:59:28.830482 Test
384 28 days 21:05:16.141263 0 days 00:00:44.999706 Test
466 NaN 38 days 22:05:06.807430 Tom
467 0 days 00:00:05.285239 NaN Tom
468 NaN NaN Tom
471 NaN NaN Tom
478 NaN 451 days 14:59:28.830482 Tom
484 28 days 21:05:16.141263 0 days 00:00:44.999706 Tom
"""
df= pd.read_csv(StringIO(df.strip()), sep='\s\s+', engine='python')
df
My question is ,how can I get the mean value of each duration of each user ?
The output should something like this(the mean value is a fake one for sample ,not the exactly mean value):
mean_duration1 mean_duration2 user
8 days 22:05:06.807430 3 days 22:05:06.807430 Test
2 days 00:00:05.285239 4 days 22:05:06.807430 Tom
You can use:
out = (df
.set_index('user')
.filter(like='duration')
.apply(pd.to_timedelta)
.groupby(level=0).mean()
.reset_index()
)
Output:
user duration1 duration2
0 Test 14 days 10:32:40.713251 163 days 12:21:46.879206
1 Tom 14 days 10:32:40.713251 163 days 12:21:46.879206