pythonpandasdataframeffill

backfill a column based on a group pandas


I am working with the following dataframe:

df = pd.DataFrame({"id": ['A', 'A', 'A', 'B', 'B', 'B', 'C','C'  ], 
            "date": [pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2018, 12, 30),pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2018, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2019, 12, 30)], 
            "other_col": ['NA', 'NA', 'A444', 'NA', 'NA', 'B666', 'NA', 'C999'],
            "other_col_1": [123, 123, 'NA', 0.765, 0.555, 'NA', 0.324, 'NA']})

What I want to achieve is: To backfill "other_col" entries for each corresponding group and to delete "other_col" when it is equal to 'NA' in "other_col_1".

I have tried groupby bfill() and ffill() df.groupby('id')['other_col'].bfill() but it does't work.

The resulting dataframe should look like this:

df_new = pd.DataFrame({"id": ['A', 'A', 'B', 'B', 'C' ], 
            "date": [pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2016, 12, 30)], 
            "other_col": ['A444', 'A444', 'B666', 'B666', 'C999'],
            "other_col_1": [123, 123, 0.765, 0.555, 0.324]})

Solution

  • First, replace 'NA' with a real NaN value, then bfill:

    df = df.replace('NA', np.nan)
    df = df.bfill()[df['other_col_1'].notna()]
    

    Output:

    >>> df
      id       date other_col  other_col_1
    0  A 2015-12-30      A444      123.000
    1  A 2016-12-30      A444      123.000
    3  B 2015-12-30      B666        0.765
    4  B 2016-12-30      B666        0.555
    6  C 2016-12-30      C999        0.324