pythonpandasaccelerometer

Summing within bouts in pandas dataframe


I am working with data generated by an accelerometer, and this particular model writes a new line of data every time there is a posture change (e.g., going from sitting to standing) or each time there is an individual step, and each row then has a duration. A few example rows (~30k rows per file):

ID Date Event Type Duration
1885 2021-06-29 4.00000 31208.50000
1885 2021-07-07 2.00000 1.10000
1885 2021-07-07 2.00000 3.20000
1885 2021-07-07 2.00000 3.50000
1885 2021-07-07 2.00000 4.30000
1885 2021-07-07 2.00000 3.70000
1885 2021-07-07 2.00000 1.20000
1885 2021-07-07 2.00000 1.10000
1885 2021-07-07 1.00000 1.00000

Event type describes whether it's laying, sitting, stepping, etc, and stepping is coded as a 2.

For each day in the dataset I'd like to sum up the duration column for each set of continuous rows with an event type of 2 so that I have the duration of each stepping bouts within each day (I'll later do some groupby math on those bouts to get things like average bout duration). I'm having a hard time figuring out what to search for here, so I appreciate any help!

I'm having a hard time figuring out what to search for here, so I appreciate any help! Obviously just grouping by Event type won't work as I need to group only within specific uninterrupted stepping bouts.


Solution

  • You can create a custom grouper:

    # identify events of type 2
    m = df['Event Type'].eq(2)
    
    # select only the type 2 rows
    # group by ID and successive rows
    # aggregate as sum
    out = (df[m]
           .groupby(['ID', (~m).groupby(df['ID']).cumsum()],
                    as_index=False)
           ['Duration'].sum()
          )
    

    Output (using an alternative input):

         ID  Duration
    0  1885      18.1
    1  1885       6.0
    

    Used input:

          ID        Date  Event Type  Duration
    0   1885  2021-06-29         4.0   31208.5
    1   1885  2021-07-07         2.0       1.1 #
    2   1885  2021-07-07         2.0       3.2 # set 1
    3   1885  2021-07-07         2.0       3.5 #
    4   1885  2021-07-07         2.0       4.3 #
    5   1885  2021-07-07         2.0       3.7 #
    6   1885  2021-07-07         2.0       1.2 #
    7   1885  2021-07-07         2.0       1.1 #
    8   1885  2021-07-07         1.0       1.0
    9   1885  2021-07-07         2.0       3.7 #
    10  1885  2021-07-07         2.0       1.2 # set 2
    11  1885  2021-07-07         2.0       1.1 #
    12  1885  2021-07-07         1.0       1.0