I have a df in r with numerous records with the below format, with 'arrival_time' values for a 12 hour period'.
id | arrival_time | wait_time_value |
---|---|---|
1 | 2020-02-20 12:02:00 | 10 |
2 | 2020-02-20 12:04:00 | 5 |
99900 | 2020-02-20 23:47:00 | 8 |
10000 | 2020-02-20 23:59:00 | 21 |
I would like to create a new df that has a row for each 15 minute slot of the arrival time period and the wait_time_value of the record with the earliest arrival time in that slot. So, in the above example, the first and last row of the new df would look like:
id | period_start | wait_time_value |
---|---|---|
1 | 2020-02-20 12:00:00 | 10 |
48 | 2020-02-20 23:45:00 | 8 |
I have used the below code to achieve this for the mean average wait time for all records in each 15 minute range, but i'm not sure how to select the value for the earliest record?
df$period_start <- align.time(df$arrival_time- 899, n = 60*15)
avgwait_df <- aggregate(wait_time_value ~ period_start, df, mean)
Use DataFrame.resample
with GroupBy.first
, remove only NaN
s and convert to DataFrame
:
df['arrival_time'] = pd.to_datetime(df['arrival_time'])
df = (df.resample('15Min', on='arrival_time')['wait_time_value']
.first()
.dropna()
.reset_index(name='wait_time_value'))
print (df)
arrival_time wait_time_value
0 2020-02-20 12:00:00 10.0
1 2020-02-20 23:45:00 8.0