pythonpandasdataframeffill

How to forward fill null values of one column from the values of another column?


I am trying to fill the null values within column 'beginning_daily_count' with the previous index value from the 'end_daily_count'. The starting dataset would be:

d = {
    'id': [1, 1, 1, 1, 1, 2, 2, 2, 2],
    'beginning_daily_count': [30, 33, 37, 46, None, 7, 1, None, 2],
    'end_daily_count': [33, 37, 46, 52, 33, 7, 1, 2, 3],
    'foils': [0, 0, 0, 0, 0, 0, 0, 1, 1]
}

and the desired dataset would be:

d = {
    'id': [1, 1, 1, 1, 1, 2, 2, 2, 2],
    'beginning_daily_count': [30, 33, 37, 46, 52, 33, 1, 1, 2],
    'end_daily_count': [33, 37, 46, 52, 33, 7, 1, 2, 3],
    'foils': [0, 0, 0, 0, 0, 0, 0, 1, 1]
}

I have attempted the following ffill() and iloc() methods, but to no avail. I admittedly have little experience with ffill and iloc.

d.iloc[beginning_daily_count.isna()].values = d.iloc[d.end_daily_count- 1].values
d['beginning_daily_count'].transform(lambda x: x.ffill(d['end_daily_count']))

Solution

  • You can fillna the column with the shifted other column per group (using GroupBy.shift to avoid leaking values from one group to the next one):

    df['beginning_daily_count'] = (df['beginning_daily_count']
                                   .fillna(df.groupby('id')['end_daily_count'].shift(),
                                           downcast='infer')
                                  )
    

    output:

       id  beginning_daily_count  end_daily_count  foils
    0   1                     30               33      0
    1   1                     33               37      0
    2   1                     37               46      0
    3   1                     46               52      0
    4   1                     52               33      0
    5   2                      7                7      0
    6   2                      1                1      0
    7   2                      1                2      1
    8   2                      2                3      1