pythonpandas

Count occurrences of each type of event within a time window in Pandas


I have a DataFrame with the following structure:

I need to add a column for each unique event_type to count how many events of that event_type occurred within a 10ms window before each row's event_timestamp.

data = {
    'event_timestamp': [
        '2024-02-01 08:02:09.065315961', '2024-02-01 08:02:09.125612099', '2024-02-01 08:02:09.160326512',
        '2024-02-01 08:02:09.540206541', '2024-02-01 08:02:09.571751697', '2024-02-01 08:02:09.571784060',
        '2024-02-01 08:02:09.574368029', '2024-02-01 08:02:09.574390737', '2024-02-01 08:02:09.578245099',
        '2024-02-01 08:02:10.077399943', '2024-02-01 08:02:10.077424252', '2024-02-01 08:02:10.081648527'
    ],
    'event_type': [
        'A', 'B', 'A', 'A', 'C', 'B', 'A', 'C', 'B', 'A', 'C', 'B'
    ]
}

df = pd.DataFrame(data)
df['event_timestamp'] = pd.to_datetime(df['event_timestamp'])

For the above input, I want an output like this:

             event_timestamp         event_type  count_A  count_B  count_C
0  2024-02-01 08:02:09.065315961          A        0        0        0
1  2024-02-01 08:02:09.125612099          B        0        0        0
2  2024-02-01 08:02:09.160326512          A        0        0        0
3  2024-02-01 08:02:09.540206541          A        0        0        0
4  2024-02-01 08:02:09.571751697          C        0        0        0
5  2024-02-01 08:02:09.571784060          B        0        0        1
6  2024-02-01 08:02:09.574368029          A        0        1        1
7  2024-02-01 08:02:09.574390737          C        1        1        1
8  2024-02-01 08:02:09.578245099          B        1        1        2
9  2024-02-01 08:02:10.077399943          A        0        0        0
10 2024-02-01 08:02:10.077424252          C        1        0        0
11 2024-02-01 08:02:10.081648527          B        1        1        0

Solution

  • IIUC, you could produce the columns with get_dummies, then perform a rolling.sum on 10ms to get the counts, finally merge back to the original DataFrame:

    out = df.merge(pd
       .get_dummies(df['event_type']).add_prefix('count_')
       .set_axis(df['event_timestamp']).sort_index()
       .rolling('10ms').sum().convert_dtypes(),
                   left_on='event_timestamp', right_index=True,
    )
    

    Variant:

    out = df.merge(df
       .set_index('event_timestamp').sort_index()
       ['event_type'].str.get_dummies().add_prefix('count_')
       .rolling('10ms').sum().convert_dtypes(),
                   left_on='event_timestamp', right_index=True,
    )
    

    Output:

                     event_timestamp event_type  count_A  count_B  count_C
    0  2024-02-01 08:02:09.065315961          A        1        0        0
    1  2024-02-01 08:02:09.125612099          B        0        1        0
    2  2024-02-01 08:02:09.160326512          A        1        0        0
    3  2024-02-01 08:02:09.540206541          A        1        0        0
    4  2024-02-01 08:02:09.571751697          C        0        0        1
    5  2024-02-01 08:02:09.571784060          B        0        1        1
    6  2024-02-01 08:02:09.574368029          A        1        1        1
    7  2024-02-01 08:02:09.574390737          C        1        1        2
    8  2024-02-01 08:02:09.578245099          B        1        2        2
    9  2024-02-01 08:02:10.077399943          A        1        0        0
    10 2024-02-01 08:02:10.077424252          C        1        0        1
    11 2024-02-01 08:02:10.081648527          B        1        1        1
    

    And if you want only the previous:

    tmp = (pd.get_dummies(df['event_type']).add_prefix('count_')
             .set_axis(df['event_timestamp']).sort_index()
          )
    
    out = df.merge(tmp.rolling('10ms').sum().sub(tmp).convert_dtypes(),
                   left_on='event_timestamp', right_index=True,
    )
    

    Output:

                     event_timestamp event_type  count_A  count_B  count_C
    0  2024-02-01 08:02:09.065315961          A        0        0        0
    1  2024-02-01 08:02:09.125612099          B        0        0        0
    2  2024-02-01 08:02:09.160326512          A        0        0        0
    3  2024-02-01 08:02:09.540206541          A        0        0        0
    4  2024-02-01 08:02:09.571751697          C        0        0        0
    5  2024-02-01 08:02:09.571784060          B        0        0        1
    6  2024-02-01 08:02:09.574368029          A        0        1        1
    7  2024-02-01 08:02:09.574390737          C        1        1        1
    8  2024-02-01 08:02:09.578245099          B        1        1        2
    9  2024-02-01 08:02:10.077399943          A        0        0        0
    10 2024-02-01 08:02:10.077424252          C        1        0        0
    11 2024-02-01 08:02:10.081648527          B        1        0        1