I have a DataFrame with the following structure:
I need to add a column for each unique event_type to count how many events of that event_type occurred within a 10ms window before each row's event_timestamp.
data = {
'event_timestamp': [
'2024-02-01 08:02:09.065315961', '2024-02-01 08:02:09.125612099', '2024-02-01 08:02:09.160326512',
'2024-02-01 08:02:09.540206541', '2024-02-01 08:02:09.571751697', '2024-02-01 08:02:09.571784060',
'2024-02-01 08:02:09.574368029', '2024-02-01 08:02:09.574390737', '2024-02-01 08:02:09.578245099',
'2024-02-01 08:02:10.077399943', '2024-02-01 08:02:10.077424252', '2024-02-01 08:02:10.081648527'
],
'event_type': [
'A', 'B', 'A', 'A', 'C', 'B', 'A', 'C', 'B', 'A', 'C', 'B'
]
}
df = pd.DataFrame(data)
df['event_timestamp'] = pd.to_datetime(df['event_timestamp'])
For the above input, I want an output like this:
event_timestamp event_type count_A count_B count_C
0 2024-02-01 08:02:09.065315961 A 0 0 0
1 2024-02-01 08:02:09.125612099 B 0 0 0
2 2024-02-01 08:02:09.160326512 A 0 0 0
3 2024-02-01 08:02:09.540206541 A 0 0 0
4 2024-02-01 08:02:09.571751697 C 0 0 0
5 2024-02-01 08:02:09.571784060 B 0 0 1
6 2024-02-01 08:02:09.574368029 A 0 1 1
7 2024-02-01 08:02:09.574390737 C 1 1 1
8 2024-02-01 08:02:09.578245099 B 1 1 2
9 2024-02-01 08:02:10.077399943 A 0 0 0
10 2024-02-01 08:02:10.077424252 C 1 0 0
11 2024-02-01 08:02:10.081648527 B 1 1 0
The columns count_A, count_B, and count_C represent the number of occurrences of event_type 'A', 'B', and 'C' that happened within a 10ms window before each row's event_timestamp.
For example, for the row with event_timestamp 2024-02-01 08:02:09.065315961, we see:
IIUC, you could produce the columns with get_dummies
, then perform a rolling.sum
on 10ms to get the counts, finally merge
back to the original DataFrame:
out = df.merge(pd
.get_dummies(df['event_type']).add_prefix('count_')
.set_axis(df['event_timestamp']).sort_index()
.rolling('10ms').sum().convert_dtypes(),
left_on='event_timestamp', right_index=True,
)
Variant:
out = df.merge(df
.set_index('event_timestamp').sort_index()
['event_type'].str.get_dummies().add_prefix('count_')
.rolling('10ms').sum().convert_dtypes(),
left_on='event_timestamp', right_index=True,
)
Output:
event_timestamp event_type count_A count_B count_C
0 2024-02-01 08:02:09.065315961 A 1 0 0
1 2024-02-01 08:02:09.125612099 B 0 1 0
2 2024-02-01 08:02:09.160326512 A 1 0 0
3 2024-02-01 08:02:09.540206541 A 1 0 0
4 2024-02-01 08:02:09.571751697 C 0 0 1
5 2024-02-01 08:02:09.571784060 B 0 1 1
6 2024-02-01 08:02:09.574368029 A 1 1 1
7 2024-02-01 08:02:09.574390737 C 1 1 2
8 2024-02-01 08:02:09.578245099 B 1 2 2
9 2024-02-01 08:02:10.077399943 A 1 0 0
10 2024-02-01 08:02:10.077424252 C 1 0 1
11 2024-02-01 08:02:10.081648527 B 1 1 1
And if you want only the previous:
tmp = (pd.get_dummies(df['event_type']).add_prefix('count_')
.set_axis(df['event_timestamp']).sort_index()
)
out = df.merge(tmp.rolling('10ms').sum().sub(tmp).convert_dtypes(),
left_on='event_timestamp', right_index=True,
)
Output:
event_timestamp event_type count_A count_B count_C
0 2024-02-01 08:02:09.065315961 A 0 0 0
1 2024-02-01 08:02:09.125612099 B 0 0 0
2 2024-02-01 08:02:09.160326512 A 0 0 0
3 2024-02-01 08:02:09.540206541 A 0 0 0
4 2024-02-01 08:02:09.571751697 C 0 0 0
5 2024-02-01 08:02:09.571784060 B 0 0 1
6 2024-02-01 08:02:09.574368029 A 0 1 1
7 2024-02-01 08:02:09.574390737 C 1 1 1
8 2024-02-01 08:02:09.578245099 B 1 1 2
9 2024-02-01 08:02:10.077399943 A 0 0 0
10 2024-02-01 08:02:10.077424252 C 1 0 0
11 2024-02-01 08:02:10.081648527 B 1 0 1