pythonpandasffill

Pandas ffill conditional upon another column


I have the following snippet from a larger dataframe:

pd.DataFrame({
    'sig_2':[False, False, True, True, True, True, True, True, True, True, True, False, False, False,
             False, True, False, True, True, True, True, True, True, True, True, True, False, False,
             False, False, True, True, True, True, False, True, True],
    'tr_cnt':[0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 2],
    'fls_cnt':[53, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 0, 0],
    'test1':[0, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 1, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 4, 4, 4, 4, 0, 16, 16],
    'test2':[54, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, 1, 0, 0],
    'check1':[False, False, True, False, False, False, False, False, False, False, False, False, False,
              False, False, False, False, False, False, False, False, False, False, False, False, False,
              False, False, False, False, True, False, False, False, False, False, False]})

tr_cnt is simply the count of successive True rows in sig_2 while fls_cnt does the same for the False rows. test1 and test2 are the total counts of the successive rows. check1 is where the following condition is True: df['test1'] > 2) & (df['fls_cnt'].shift(1) >= 2)

What i would like to do is forward fill the True values in the check1 column by the corresponding value in the test1 column.

For example, the first True row in the check1 column has a corresponding value of 9 in the test column. Thus, i would like to forward fill this value by another 8 rows. The next True has a correpsonding value of 4 so i want this to forward fill by another 3 rows.

I have tried the following different methods:

mask = (df['test1'] > 2) & (df['fls_cnt'].shift(1) >= 2)

# Trial 1
groups = mask.ne(mask.shift()).cumsum()
df['test1'] = df.groupby(groups)['test1'].ffill().astype(bool)


# Trial 2
df['test2'] = df.groupby(mask)['test1'].ffill().astype(bool)


# Trial 3
groups = (~mask).cumsum()
df['test3'] = df.groupby(groups)['test1'].ffill().astype(bool)

# Trial 4
df["test4"] = df.groupby(mask.cumsum())['test1'].ffill().fillna(False).astype(bool)

In all of these methods, they seem to do what i want where the condition is True, but they are also including rows that dont satisfy this condition which is odd.


Solution

  • As @mozway points out in the comments to your question, it is a little unclear what you are trying to do. I suggest here my interpretation of your logic. If it isn't what you were expecting, I'll simply remove the answer. But in that case, I also suggest you review you question and elaborate, be more clear:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({
        'sig_2': [False, False, True, True, True, True, True, True, True, True, True, False, False, False,
                  False, True, False, True, True, True, True, True, True, True, True, True, False, False,
                  False, False, True, True, True, True, False, True, True],
        'tr_cnt': [0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 2],
        'fls_cnt': [53, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 0, 0],
        'test1': [0, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 1, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 4, 4, 4, 4, 0, 16, 16],
        'test2': [54, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, 1, 0, 0],
        'check1': [False, False, True, False, False, False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False, False, False, False, False, False, False,
                   False, False, False, False, True, False, False, False, False, False, False]
    })
    df['forward_fill'] = False  
    
    for i in df.index[df['check1']]:
        end_index = i + df.at[i, 'test1']  
        if end_index > len(df):
            end_index = len(df)
        
        df.loc[i:end_index, 'forward_fill'] = True
    
    print(df)
    
    

    Which gives:

       sig_2  tr_cnt  fls_cnt  test1  test2  check1  forward_fill
    0   False       0       53      0     54   False         False
    1   False       0       54      0     54   False         False
    2    True       1        0      9      0    True          True
    3    True       2        0      9      0   False          True
    4    True       3        0      9      0   False          True
    5    True       4        0      9      0   False          True
    6    True       5        0      9      0   False          True
    7    True       6        0      9      0   False          True
    8    True       7        0      9      0   False          True
    9    True       8        0      9      0   False          True
    10   True       9        0      9      0   False          True
    11  False       0        1      0      4   False          True
    12  False       0        2      0      4   False         False
    13  False       0        3      0      4   False         False
    14  False       0        4      0      4   False         False
    15   True       1        0      1      0   False         False
    16  False       0        1      0      1   False         False
    17   True       1        0      9      0   False         False
    18   True       2        0      9      0   False         False
    19   True       3        0      9      0   False         False
    20   True       4        0      9      0   False         False
    21   True       5        0      9      0   False         False
    22   True       6        0      9      0   False         False
    23   True       7        0      9      0   False         False
    24   True       8        0      9      0   False         False
    25   True       9        0      9      0   False         False
    26  False       0        1      0      4   False         False
    27  False       0        2      0      4   False         False
    28  False       0        3      0      4   False         False
    29  False       0        4      0      4   False         False
    30   True       1        0      4      0    True          True
    31   True       2        0      4      0   False          True
    32   True       3        0      4      0   False          True
    33   True       4        0      4      0   False          True
    34  False       0        1      0      1   False          True
    35   True       1        0     16      0   False         False
    36   True       2        0     16      0   False         False
    

    Update: An alternative way (more vectorized) would be:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({
        'sig_2': [False, False, True, True, True, True, True, True, True, True, True, False, False, False,
                  False, True, False, True, True, True, True, True, True, True, True, True, False, False,
                  False, False, True, True, True, True, False, True, True],
        'tr_cnt': [0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 2],
        'fls_cnt': [53, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 1, 0, 0],
        'test1': [0, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 1, 0, 9, 9, 9, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 4, 4, 4, 4, 0, 16, 16],
        'test2': [54, 54, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, 1, 0, 0],
        'check1': [False, False, True, False, False, False, False, False, False, False, False, False, False,
                   False, False, False, False, False, False, False, False, False, False, False, False, False,
                   False, False, False, False, True, False, False, False, False, False, False]
    })
    
    indices = np.repeat(df.index[df['check1']], df['test1'][df['check1']])
    ranges = np.concatenate([np.arange(start, start + length) for start, length in zip(df.index[df['check1']], df['test1'][df['check1']])])
    mask = np.zeros(len(df), dtype=bool)
    mask[ranges] = True
    df['forward_fill'] = mask
    print(df)
    

    which retunrs the same output:

        sig_2  tr_cnt  fls_cnt  test1  test2  check1  forward_fill
    0   False       0       53      0     54   False         False
    1   False       0       54      0     54   False         False
    2    True       1        0      9      0    True          True
    3    True       2        0      9      0   False          True
    4    True       3        0      9      0   False          True
    5    True       4        0      9      0   False          True
    6    True       5        0      9      0   False          True
    7    True       6        0      9      0   False          True
    8    True       7        0      9      0   False          True
    9    True       8        0      9      0   False          True
    10   True       9        0      9      0   False          True
    11  False       0        1      0      4   False         False
    12  False       0        2      0      4   False         False
    13  False       0        3      0      4   False         False
    14  False       0        4      0      4   False         False
    15   True       1        0      1      0   False         False
    16  False       0        1      0      1   False         False
    17   True       1        0      9      0   False         False
    18   True       2        0      9      0   False         False
    19   True       3        0      9      0   False         False
    20   True       4        0      9      0   False         False
    21   True       5        0      9      0   False         False
    22   True       6        0      9      0   False         False
    23   True       7        0      9      0   False         False
    24   True       8        0      9      0   False         False
    25   True       9        0      9      0   False         False
    26  False       0        1      0      4   False         False
    27  False       0        2      0      4   False         False
    28  False       0        3      0      4   False         False
    29  False       0        4      0      4   False         False
    30   True       1        0      4      0    True          True
    31   True       2        0      4      0   False          True
    32   True       3        0      4      0   False          True
    33   True       4        0      4      0   False          True
    34  False       0        1      0      1   False         False
    35   True       1        0     16      0   False         False
    36   True       2        0     16      0   False         False