pythonpandasmultilabel-classificationfind-occurrences

Generate Label by First occurrence


Below is a snapshot of my data frame, and I have a list of ID from 1 to N, with outcomes 1 or 0. What I am trying to do is to generate the Label and csum columns basic on the occurrence of the outcome in Python, how do I achieve this and loop through N Ids?

Label: the past 3 days when outcome 1 occur

csum: Keep Counting and sum 1 until 1 occurs, then reset again

My Dataset:

ID Date Outcome
ID_1 2021-01-01 0
ID_1 2021-01-02 0
ID_1 2021-01-03 0
ID_1 2021-01-04 0
ID_1 2021-01-05 0
ID_1 2021-01-06 0
ID_1 2021-01-07 0
ID_1 2021-01-08 1
ID_1 2021-01-09 0
ID_1 2021-01-10 0
ID_1 2021-01-11 0
ID_2 2021-01-01 0
ID_2 2021-01-02 0
ID_2 2021-01-03 1
ID_2 2021-01-04 0
ID_2 2021-01-05 0
ID_2 2021-01-06 0
ID_2 2021-01-07 0
ID_2 2021-01-08 0
ID_2 2021-01-09 1
ID_2 2021-01-10 0
ID_2 2021-01-11 0

My Target Outcome:

ID Date Outcome Label Csum
ID_1 2021-01-01 0 0 1
ID_1 2021-01-02 0 0 2
ID_1 2021-01-03 0 0 3
ID_1 2021-01-04 0 0 4
ID_1 2021-01-05 0 0 5
ID_1 2021-01-06 0 lab1 6
ID_1 2021-01-07 0 lab1 7
ID_1 2021-01-08 1 lab1 8
ID_1 2021-01-09 0 0 1
ID_1 2021-01-10 0 0 2
ID_1 2021-01-11 0 0 3
ID_2 2021-01-01 0 lab1 1
ID_2 2021-01-02 0 lab1 2
ID_2 2021-01-03 1 lab1 3
ID_2 2021-01-04 0 0 1
ID_2 2021-01-05 0 0 2
ID_2 2021-01-06 0 0 3
ID_2 2021-01-07 0 lab1 4
ID_2 2021-01-08 0 lab1 5
ID_2 2021-01-09 1 lab1 6
ID_2 2021-01-10 0 0 1
ID_2 2021-01-11 0 0 2

Solution

  • IIUC, you can use:

    # identify the rows with a 1 in the next 3 rows (per group)
    m = df[::-1].groupby('ID').rolling(3, min_periods=1)['Outcome'].max().droplevel(0)
    
    # identify the rows where 0 restarts
    df['Label'] = m.mask(m.eq(1), 'lab1')
    group = (df['Label'].eq(0)&df['Label'].ne(df['Label'].shift())).cumsum()
    
    # compute cumcount
    df['Csum'] = df.groupby(['ID', group]).cumcount().add(1)
    

    output:

          ID        Date  Outcome Label  Csum
    0   ID_1  2021-01-01        0   0.0     1
    1   ID_1  2021-01-02        0   0.0     2
    2   ID_1  2021-01-03        0   0.0     3
    3   ID_1  2021-01-04        0   0.0     4
    4   ID_1  2021-01-05        0   0.0     5
    5   ID_1  2021-01-06        0  lab1     6
    6   ID_1  2021-01-07        0  lab1     7
    7   ID_1  2021-01-08        1  lab1     8
    8   ID_1  2021-01-09        0   0.0     1
    9   ID_1  2021-01-10        0   0.0     2
    10  ID_1  2021-01-11        0   0.0     3
    11  ID_2  2021-01-01        0  lab1     1
    12  ID_2  2021-01-02        0  lab1     2
    13  ID_2  2021-01-03        1  lab1     3
    14  ID_2  2021-01-04        0   0.0     1
    15  ID_2  2021-01-05        0   0.0     2
    16  ID_2  2021-01-06        0   0.0     3
    17  ID_2  2021-01-07        0  lab1     4
    18  ID_2  2021-01-08        0  lab1     5
    19  ID_2  2021-01-09        1  lab1     6
    20  ID_2  2021-01-10        0   0.0     1
    21  ID_2  2021-01-11        0   0.0     2