Below is a snapshot of my data frame, and I have a list of ID from 1 to N, with outcomes 1 or 0. What I am trying to do is to generate the Label and csum columns basic on the occurrence of the outcome in Python, how do I achieve this and loop through N Ids?
Label: the past 3 days when outcome 1 occur
csum: Keep Counting and sum 1 until 1 occurs, then reset again
My Dataset:
ID | Date | Outcome |
---|---|---|
ID_1 | 2021-01-01 | 0 |
ID_1 | 2021-01-02 | 0 |
ID_1 | 2021-01-03 | 0 |
ID_1 | 2021-01-04 | 0 |
ID_1 | 2021-01-05 | 0 |
ID_1 | 2021-01-06 | 0 |
ID_1 | 2021-01-07 | 0 |
ID_1 | 2021-01-08 | 1 |
ID_1 | 2021-01-09 | 0 |
ID_1 | 2021-01-10 | 0 |
ID_1 | 2021-01-11 | 0 |
ID_2 | 2021-01-01 | 0 |
ID_2 | 2021-01-02 | 0 |
ID_2 | 2021-01-03 | 1 |
ID_2 | 2021-01-04 | 0 |
ID_2 | 2021-01-05 | 0 |
ID_2 | 2021-01-06 | 0 |
ID_2 | 2021-01-07 | 0 |
ID_2 | 2021-01-08 | 0 |
ID_2 | 2021-01-09 | 1 |
ID_2 | 2021-01-10 | 0 |
ID_2 | 2021-01-11 | 0 |
My Target Outcome:
ID | Date | Outcome | Label | Csum |
---|---|---|---|---|
ID_1 | 2021-01-01 | 0 | 0 | 1 |
ID_1 | 2021-01-02 | 0 | 0 | 2 |
ID_1 | 2021-01-03 | 0 | 0 | 3 |
ID_1 | 2021-01-04 | 0 | 0 | 4 |
ID_1 | 2021-01-05 | 0 | 0 | 5 |
ID_1 | 2021-01-06 | 0 | lab1 | 6 |
ID_1 | 2021-01-07 | 0 | lab1 | 7 |
ID_1 | 2021-01-08 | 1 | lab1 | 8 |
ID_1 | 2021-01-09 | 0 | 0 | 1 |
ID_1 | 2021-01-10 | 0 | 0 | 2 |
ID_1 | 2021-01-11 | 0 | 0 | 3 |
ID_2 | 2021-01-01 | 0 | lab1 | 1 |
ID_2 | 2021-01-02 | 0 | lab1 | 2 |
ID_2 | 2021-01-03 | 1 | lab1 | 3 |
ID_2 | 2021-01-04 | 0 | 0 | 1 |
ID_2 | 2021-01-05 | 0 | 0 | 2 |
ID_2 | 2021-01-06 | 0 | 0 | 3 |
ID_2 | 2021-01-07 | 0 | lab1 | 4 |
ID_2 | 2021-01-08 | 0 | lab1 | 5 |
ID_2 | 2021-01-09 | 1 | lab1 | 6 |
ID_2 | 2021-01-10 | 0 | 0 | 1 |
ID_2 | 2021-01-11 | 0 | 0 | 2 |
IIUC, you can use:
# identify the rows with a 1 in the next 3 rows (per group)
m = df[::-1].groupby('ID').rolling(3, min_periods=1)['Outcome'].max().droplevel(0)
# identify the rows where 0 restarts
df['Label'] = m.mask(m.eq(1), 'lab1')
group = (df['Label'].eq(0)&df['Label'].ne(df['Label'].shift())).cumsum()
# compute cumcount
df['Csum'] = df.groupby(['ID', group]).cumcount().add(1)
output:
ID Date Outcome Label Csum
0 ID_1 2021-01-01 0 0.0 1
1 ID_1 2021-01-02 0 0.0 2
2 ID_1 2021-01-03 0 0.0 3
3 ID_1 2021-01-04 0 0.0 4
4 ID_1 2021-01-05 0 0.0 5
5 ID_1 2021-01-06 0 lab1 6
6 ID_1 2021-01-07 0 lab1 7
7 ID_1 2021-01-08 1 lab1 8
8 ID_1 2021-01-09 0 0.0 1
9 ID_1 2021-01-10 0 0.0 2
10 ID_1 2021-01-11 0 0.0 3
11 ID_2 2021-01-01 0 lab1 1
12 ID_2 2021-01-02 0 lab1 2
13 ID_2 2021-01-03 1 lab1 3
14 ID_2 2021-01-04 0 0.0 1
15 ID_2 2021-01-05 0 0.0 2
16 ID_2 2021-01-06 0 0.0 3
17 ID_2 2021-01-07 0 lab1 4
18 ID_2 2021-01-08 0 lab1 5
19 ID_2 2021-01-09 1 lab1 6
20 ID_2 2021-01-10 0 0.0 1
21 ID_2 2021-01-11 0 0.0 2