I have the following dataframe:
df = pd.DataFrame({"A": ["sell", np.nan, np.nan, np.nan, np.nan, "buy", np.nan, np.nan, np.nan, np.nan],
"B": ["buy", "buy", "sell", "buy", "buy", np.nan, "buy", "buy", "buy", np.nan],
"C": ["sell", "sell", "sell", "sell", "buy", "sell", "sell", "buy", "buy", np.nan]},
index=pd.date_range("2025-05-22", periods=10, freq="15min"))
print df
A B C
2025-05-22 00:00:00 sell buy sell
2025-05-22 00:15:00 NaN buy sell
2025-05-22 00:30:00 NaN sell sell
2025-05-22 00:45:00 NaN buy sell
2025-05-22 01:00:00 NaN buy buy
2025-05-22 01:15:00 buy NaN sell
2025-05-22 01:30:00 NaN buy sell
2025-05-22 01:45:00 NaN buy buy
2025-05-22 02:00:00 NaN buy buy
2025-05-22 02:15:00 NaN NaN buy
I want to forward fill the NaN in "A" column until all 3 columns are equals. It should look like this
A B C
2025-05-22 00:00:00 sell buy sell
2025-05-22 00:15:00 sell buy sell
2025-05-22 00:30:00 sell sell sell
2025-05-22 00:45:00 NaN buy sell
2025-05-22 01:00:00 NaN buy buy
2025-05-22 01:15:00 buy NaN sell
2025-05-22 01:30:00 buy buy sell
2025-05-22 01:45:00 buy buy buy
2025-05-22 02:00:00 NaN buy buy
2025-05-22 02:15:00 NaN NaN buy
Any suggestion will be greatly appreciated
A possible solution:
def fill(df):
# infer_objects because the first elem of A may be NA
s = df['A'].infer_objects(copy=False).ffill()
idx = (s.eq(df['B']) & df['B'].eq(df['C'])).idxmax()
df.loc[:idx, 'A'] = s[:idx]
return df
cond = df['A'].notna().cumsum()
df.groupby(cond, as_index=False).apply(fill).reset_index(level=0, drop=True)
It first constructs a grouping key with cond = df['A'].notna().cumsum()
, where notna
marks non-missing values in A
, and cumsum
creates a unique group label for each contiguous block of rows following a non-null value. Then, the dataframe is grouped with groupby
using this key and processed by the fill
function. Inside fill
, ffill
forward-fills A
, and the index where A
, B
, and C
first become equal is found via boolean comparisons and idxmax
. The column A
is then updated up to that point within the group. Finally, reset_index(level=0, drop=True)
removes the added group index.
Output:
A B C
2025-05-22 00:00:00 sell buy sell
2025-05-22 00:15:00 sell buy sell
2025-05-22 00:30:00 sell sell sell
2025-05-22 00:45:00 NaN buy sell
2025-05-22 01:00:00 NaN buy buy
2025-05-22 01:15:00 buy NaN sell
2025-05-22 01:30:00 buy buy sell
2025-05-22 01:45:00 buy buy buy
2025-05-22 02:00:00 NaN buy buy
2025-05-22 02:15:00 NaN NaN NaN