pythonpandas

pandas filling value until conditions in met


I have the following dataframe:

df = pd.DataFrame({"A": ["sell", np.nan, np.nan, np.nan, np.nan, "buy", np.nan, np.nan, np.nan, np.nan],
                   "B": ["buy", "buy", "sell", "buy", "buy", np.nan, "buy", "buy", "buy", np.nan],
                   "C": ["sell", "sell", "sell", "sell", "buy", "sell", "sell", "buy", "buy", np.nan]},
                   index=pd.date_range("2025-05-22", periods=10, freq="15min"))


print df

                        A     B     C
2025-05-22 00:00:00  sell   buy  sell
2025-05-22 00:15:00   NaN   buy  sell
2025-05-22 00:30:00   NaN  sell  sell
2025-05-22 00:45:00   NaN   buy  sell
2025-05-22 01:00:00   NaN   buy   buy
2025-05-22 01:15:00   buy   NaN  sell
2025-05-22 01:30:00   NaN   buy  sell
2025-05-22 01:45:00   NaN   buy   buy
2025-05-22 02:00:00   NaN   buy   buy
2025-05-22 02:15:00   NaN   NaN   buy

I want to forward fill the NaN in "A" column until all 3 columns are equals. It should look like this

                        A     B     C
2025-05-22 00:00:00  sell   buy  sell
2025-05-22 00:15:00  sell   buy  sell
2025-05-22 00:30:00  sell  sell  sell
2025-05-22 00:45:00   NaN   buy  sell
2025-05-22 01:00:00   NaN   buy   buy
2025-05-22 01:15:00   buy   NaN  sell
2025-05-22 01:30:00   buy   buy  sell
2025-05-22 01:45:00   buy   buy   buy
2025-05-22 02:00:00   NaN   buy   buy
2025-05-22 02:15:00   NaN   NaN   buy 

Any suggestion will be greatly appreciated


Solution

  • A possible solution:

    def fill(df):
        # infer_objects because the first elem of A may be NA
        s = df['A'].infer_objects(copy=False).ffill()
        idx = (s.eq(df['B']) & df['B'].eq(df['C'])).idxmax()
        df.loc[:idx, 'A'] = s[:idx]
        return df
    
    cond = df['A'].notna().cumsum()
    df.groupby(cond, as_index=False).apply(fill).reset_index(level=0, drop=True)
    

    It first constructs a grouping key with cond = df['A'].notna().cumsum(), where notna marks non-missing values in A, and cumsum creates a unique group label for each contiguous block of rows following a non-null value. Then, the dataframe is grouped with groupby using this key and processed by the fill function. Inside fill, ffill forward-fills A, and the index where A, B, and C first become equal is found via boolean comparisons and idxmax. The column A is then updated up to that point within the group. Finally, reset_index(level=0, drop=True) removes the added group index.

    Output:

                            A     B     C
    2025-05-22 00:00:00  sell   buy  sell
    2025-05-22 00:15:00  sell   buy  sell
    2025-05-22 00:30:00  sell  sell  sell
    2025-05-22 00:45:00   NaN   buy  sell
    2025-05-22 01:00:00   NaN   buy   buy
    2025-05-22 01:15:00   buy   NaN  sell
    2025-05-22 01:30:00   buy   buy  sell
    2025-05-22 01:45:00   buy   buy   buy
    2025-05-22 02:00:00   NaN   buy   buy
    2025-05-22 02:15:00   NaN   NaN   NaN