I am working with data generated by an accelerometer, and this particular model writes a new line of data every time there is a posture change (e.g., going from sitting to standing) or each time there is an individual step, and each row then has a duration. A few example rows (~30k rows per file):
ID | Date | Event Type | Duration |
---|---|---|---|
1885 | 2021-06-29 | 4.00000 | 31208.50000 |
1885 | 2021-07-07 | 2.00000 | 1.10000 |
1885 | 2021-07-07 | 2.00000 | 3.20000 |
1885 | 2021-07-07 | 2.00000 | 3.50000 |
1885 | 2021-07-07 | 2.00000 | 4.30000 |
1885 | 2021-07-07 | 2.00000 | 3.70000 |
1885 | 2021-07-07 | 2.00000 | 1.20000 |
1885 | 2021-07-07 | 2.00000 | 1.10000 |
1885 | 2021-07-07 | 1.00000 | 1.00000 |
Event type describes whether it's laying, sitting, stepping, etc, and stepping is coded as a 2.
For each day in the dataset I'd like to sum up the duration column for each set of continuous rows with an event type of 2 so that I have the duration of each stepping bouts within each day (I'll later do some groupby math on those bouts to get things like average bout duration). I'm having a hard time figuring out what to search for here, so I appreciate any help!
I'm having a hard time figuring out what to search for here, so I appreciate any help! Obviously just grouping by Event type won't work as I need to group only within specific uninterrupted stepping bouts.
You can create a custom grouper:
# identify events of type 2
m = df['Event Type'].eq(2)
# select only the type 2 rows
# group by ID and successive rows
# aggregate as sum
out = (df[m]
.groupby(['ID', (~m).groupby(df['ID']).cumsum()],
as_index=False)
['Duration'].sum()
)
Output (using an alternative input):
ID Duration
0 1885 18.1
1 1885 6.0
Used input:
ID Date Event Type Duration
0 1885 2021-06-29 4.0 31208.5
1 1885 2021-07-07 2.0 1.1 #
2 1885 2021-07-07 2.0 3.2 # set 1
3 1885 2021-07-07 2.0 3.5 #
4 1885 2021-07-07 2.0 4.3 #
5 1885 2021-07-07 2.0 3.7 #
6 1885 2021-07-07 2.0 1.2 #
7 1885 2021-07-07 2.0 1.1 #
8 1885 2021-07-07 1.0 1.0
9 1885 2021-07-07 2.0 3.7 #
10 1885 2021-07-07 2.0 1.2 # set 2
11 1885 2021-07-07 2.0 1.1 #
12 1885 2021-07-07 1.0 1.0