pythonpandasconditional-statementsfillnaffill

Pandas DataFrame conditional forward filling based on first row values


I have the following DataFrame:

import pandas as pd
df = pd.DataFrame({
    'col1':['A',pd.NA,pd.NA,pd.NA,pd.NA, 'B', pd.NA, pd.NA],
    'col2':[9.5, 6,24,8, 30, 7, 6, 8],
    })
print(df)

Giving:

   col1  col2
0     A   9.5
1  <NA>   6.0
2  <NA>  24.0
3  <NA>   8.0
4  <NA>  30.0
5     B   7.0
6  <NA>   6.0
7  <NA>   8.0

What I'd like to achieve is to forward fill col1, but not in rows where the value in col2 is greater than the col2 value of the row I'm forward filling from. Like so:

   col1  col2
0     A   9.5
1     A   6.0
2  <NA>  24.0
3     A   8.0
4  <NA>  30.0
5     B   7.0
6     B   6.0
7  <NA>   8.0

Here, index 1 and 3 are forward filled, but index 2 and 4 are not, as the col2 values (24, 30) are greater than the initial row's col2 value (9.5)

I can achieve the desired result by iterating through the df, like so:

val2 = -1
for i, r in df.iterrows():
    if not pd.isnull(r['col1']):
        val1 = r['col1']
        val2 = r['col2']
    else:
        if r['col2']<val2:
            df.loc[i, 'col1'] = val1
print(df)

Is there a way to achieve this without using iterrows()?


Solution

  • You can forward filling missing values to helper Series s and then compare groups by this Series - first value by all values and if values are not greater replace values in numpy.where:

    s = df['col1'].ffill()
    
    df['col1'] = np.where(df.groupby(s)['col2'].transform('first').gt(df['col2']),s,df['col1'])
    print (df)
       col1  col2
    0     A   9.5
    1     A   6.0
    2  <NA>  24.0
    3     A   8.0
    4  <NA>  30.0
    5     B   7.0
    6     B   6.0
    7  <NA>   8.0