I have dataframe like this
data = {
"timeStamp": ['06:00:00', '06:03:00', '06:10:00', '06:30:00', '06:32:00', '06:02:00', '06:05:00', '06:06:00', '06:55:00', '06:00:00', '06:01:00', '06:20:00', '07:00:00'],
"Event": ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'D']
}
df = pd.DataFrame(data)
I need to know shortest intervals that contain 3 or more rows by each group.
In given example we see:
Desired output looks like
Event | Interval |
---|---|
A | 00:10:00 |
B | 00:04:00 |
C | 00:20:00 |
D | NA |
...
etc
Is there any elegant way to do this?
You can groupby("Event")
and then apply a custom aggregation function.
# Convert to datetime to compute intervals
df['timeStamp'] = pd.to_datetime(df['timeStamp'])
def find_shortest_interval_3(group):
if len(group) < 3:
return None
group = group.sort_values('timeStamp')
min_interval = pd.Timedelta.max
for i in range(len(group) - 2): # You can parameterize this so that its not always 3
current_interval = group.iloc[i+2]['timeStamp'] - group.iloc[i]['timeStamp']
if current_interval < min_interval:
min_interval = current_interval
return min_interval
print(df.groupby('Event').apply(find_shortest_interval_3))
Output:
Event
A 0 days 00:10:00
B 0 days 00:04:00
C 0 days 00:20:00
D NaT