pythonpandasgroup-byffill

I want to drag a column value which are in integer with respect to another column


Below is my data frame.

df = pd.DataFrame({'vin':['aaa','aaa','aaa','bbb','bbb','bbb','ddd','eee','eee','fff'],
                'c1':[35,'NA','NA','NA',4,'NA','NA','NA',56,406],
                'c2':[35,'NA','NA','NA',43,'NA','NA','NA',67,77],
                'c3':[36,'NA','NA','NA',104,'NA','NA','NA',44,306]})

I want to drag integer values of 'c1' 'c2', and 'c3' with respect to column 'vin'.

Below is the expected output:-

df = pd.DataFrame({'vin':['aaa','aaa','aaa','bbb','bbb','bbb','ddd','eee','eee','fff'],
                'c1':[35,35,35,4,4,4,'NA',56,56,406],
                'c2':[35,35,35,43,43,43,'NA',67,67,77],
                'c3':[36,36,36,104,104,104,'NA',44,44,306]})

Note: The value of 'vin' column 'ddd' will remain as 'NA' as there is no integer value is available for 'ddd'.

I tried ffill() or groupbyffill() but I don't get the expected output. Below are lines of code, I tried.

#(1)
df['c1'] = df.groupby(['vin'])['c1'].ffill()


#(2)
df[["c1","c2" "c3"]] = df[["c1", "c2","c3"]].fillna(df.groupby(['vin'])[["c1", "c2","c3"]].ffill())


#(3)
df["c1"] = df.groupby('vin')['c1'].transform(lambda x: x.ffill())

Solution

  • If there is one non missing value per group with ffill first sorting clumns:

    c = ["c1","c2", "c3"]
    df[c] = df.replace('NA',np.nan).sort_values(['vin', 'c1']).groupby('vin')[c].ffill()
    print (df)
       vin     c1    c2     c3
    0  aaa   35.0  35.0   36.0
    1  aaa   35.0  35.0   36.0
    2  aaa   35.0  35.0   36.0
    3  bbb    4.0  43.0  104.0
    4  bbb    4.0  43.0  104.0
    5  bbb    4.0  43.0  104.0
    6  ddd    NaN   NaN    NaN
    7  eee   56.0  67.0   44.0
    8  eee   56.0  67.0   44.0
    9  fff  406.0  77.0  306.0
    

    Or aggregate first non missing values for new columns in GroupBy.transform:

    c = ["c1","c2", "c3"]
    df[c] = df.replace('NA',np.nan).groupby('vin')[c].transform('first')
    print (df)
       vin     c1    c2     c3
    0  aaa   35.0  35.0   36.0
    1  aaa   35.0  35.0   36.0
    2  aaa   35.0  35.0   36.0
    3  bbb    4.0  43.0  104.0
    4  bbb    4.0  43.0  104.0
    5  bbb    4.0  43.0  104.0
    6  ddd    NaN   NaN    NaN
    7  eee   56.0  67.0   44.0
    8  eee   56.0  67.0   44.0
    9  fff  406.0  77.0  306.0