python-3.xpandaschain

Implementing a loop calculation on pandas rows based on chain


I have below block of codes,

import pandas as pd
dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
        .sort_values(by = 'xx1')
        .reset_index(drop = True))
dat
for i in range(1, dat.shape[0]) :
    if (dat.loc[i, 'aa2'] == 'qq') :
        dat.loc[i, 'xx3'] = dat.loc[i - 1, 'xx3']

dat

I am wondering if the second block of codes i.e.

for i in range(1, dat.shape[0]) :
    if (dat.loc[i, 'aa2'] == 'qq') :
        dat.loc[i, 'xx3'] = dat.loc[i - 1, 'xx3']

can be implemented using chain in continuation with the first block. Means, I am hoping to have below sort of things,

dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
        .sort_values(by = 'xx1')
        .reset_index(drop = True)
        ### implement the for loop here
     )

Any pointer will be very helpful


Solution

  • You can assign xx3 again by masking the qq values and forward-filling it. Since the loop starts from index=1, we start the mask from index=1:

    dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
            .sort_values(by = 'xx1')
            .reset_index(drop = True)
            .assign(xx3 = lambda df: df['xx3'].mask(df['aa2'].eq('qq') & (df.index!=0)).ffill().astype(df['xx3'].dtype))
          )
    

    Output:

       xx1 aa2  xx3
    0    1  qq    6
    1    2  pp    5
    2    3  qq    5