I am working with the following dataframe:
df = pd.DataFrame({"id": ['A', 'A', 'A', 'B', 'B', 'B', 'C','C' ],
"date": [pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2018, 12, 30),pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2018, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2019, 12, 30)],
"other_col": ['NA', 'NA', 'A444', 'NA', 'NA', 'B666', 'NA', 'C999'],
"other_col_1": [123, 123, 'NA', 0.765, 0.555, 'NA', 0.324, 'NA']})
What I want to achieve is: To backfill "other_col" entries for each corresponding group and to delete "other_col" when it is equal to 'NA' in "other_col_1".
I have tried groupby bfill() and ffill() df.groupby('id')['other_col'].bfill()
but it does't work.
The resulting dataframe should look like this:
df_new = pd.DataFrame({"id": ['A', 'A', 'B', 'B', 'C' ],
"date": [pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2015, 12, 30), pd.Timestamp(2016, 12, 30), pd.Timestamp(2016, 12, 30)],
"other_col": ['A444', 'A444', 'B666', 'B666', 'C999'],
"other_col_1": [123, 123, 0.765, 0.555, 0.324]})
First, replace 'NA'
with a real NaN
value, then bfill
:
df = df.replace('NA', np.nan)
df = df.bfill()[df['other_col_1'].notna()]
Output:
>>> df
id date other_col other_col_1
0 A 2015-12-30 A444 123.000
1 A 2016-12-30 A444 123.000
3 B 2015-12-30 B666 0.765
4 B 2016-12-30 B666 0.555
6 C 2016-12-30 C999 0.324