pythonpandasresample

python pandas resample how to let last fill last null value?


I have a time series like below,

ts
Out[20]: 
time
2023-08-01 10:31:40.110    6.22
2023-08-01 10:31:43.110    6.23
2023-08-01 10:31:46.111    6.23
2023-08-01 10:31:49.111    6.24
2023-08-01 10:31:52.111    6.24
2023-08-01 10:31:55.117    6.25
2023-08-01 10:31:58.112     NaN
2023-08-01 10:32:01.112     NaN
2023-08-01 10:32:04.117     NaN
2023-08-01 10:32:07.095     NaN

when I do ts.resample('6S', closed='left', label='right').last()

ts.resample('6S', closed='left', label='right').last()
Out[21]: 
time
2023-08-01 10:31:42    6.22
2023-08-01 10:31:48    6.23
2023-08-01 10:31:54    6.24
2023-08-01 10:32:00    6.25
2023-08-01 10:32:06     NaN
2023-08-01 10:32:12     NaN

The problem is I actually want 2023-08-01 10:32:00 to be filled with whatever value previous, no matter it is a numeric or null, in this case, should be NaN How to do that? Great thanks.

To get series above,

pd.Series( [6.22, 6.23, 6.23, 6.24, 6.24, 6.25, np.nan, np.nan, np.nan, np.nan], index = pd.DatetimeIndex(['2023-08-01 10:31:40.110000', '2023-08-01 10:31:43.110000',
               '2023-08-01 10:31:46.111000', '2023-08-01 10:31:49.111000',
               '2023-08-01 10:31:52.111000', '2023-08-01 10:31:55.117000',
               '2023-08-01 10:31:58.112000', '2023-08-01 10:32:01.112000',
               '2023-08-01 10:32:04.117000', '2023-08-01 10:32:07.095000'],
              dtype='datetime64[ns]', name='exchange_time', freq=None))

Solution

  • last is designed to get the last non-NA value.

    Compute the last non-null entry of each column

    It looks like you need agg/apply:

    ts.resample('6S', closed='left', label='right').agg(lambda x: x.iloc[-1])
    
    ts.resample('6S', closed='left', label='right').apply(lambda x: x.iloc[-1])
    

    Output:

    exchange_time
    2023-08-01 10:31:42    6.22
    2023-08-01 10:31:48    6.23
    2023-08-01 10:31:54    6.24
    2023-08-01 10:32:00     NaN
    2023-08-01 10:32:06     NaN
    2023-08-01 10:32:12     NaN
    Freq: 6S, dtype: float64
    

    If you want to handle gaps:

    (ts.resample('6S', closed='left', label='right')
       .agg(lambda x: x.iloc[-1] if len(x) else np.nan)
    )
    
    # or
    import numpy as np
    
    (ts.resample('6S', closed='left', label='right')
       .agg(lambda x: np.r_[np.nan, x][-1])
    )
    
    # or
    (ts.resample('6S', closed='left', label='right')
       .agg(lambda x: next(iter(x.iloc[-1:]), np.nan))
    )
    

    Example:

    exchange_time
    2023-08-01 10:31:42    6.22
    2023-08-01 10:31:48    6.23
    2023-08-01 10:31:54    6.24
    2023-08-01 10:32:00     NaN
    2023-08-01 10:32:06     NaN
    2023-08-01 10:32:12     NaN
    2023-08-01 10:32:18     NaN
    2023-08-01 10:32:24     NaN
    2023-08-01 10:32:30     NaN
    2023-08-01 10:32:36     NaN
    2023-08-01 10:32:42     NaN
    2023-08-01 10:32:48     NaN
    2023-08-01 10:32:54     NaN
    2023-08-01 10:33:00     NaN
    2023-08-01 10:33:06     NaN
    2023-08-01 10:33:12     NaN
    Freq: 6S, dtype: float64
    

    Alternative input:

    exchange_time
    2023-08-01 10:31:40.110    6.22
    2023-08-01 10:31:43.110    6.23
    2023-08-01 10:31:46.111    6.23
    2023-08-01 10:31:49.111    6.24
    2023-08-01 10:31:52.111    6.24
    2023-08-01 10:31:55.117    6.25
    2023-08-01 10:31:58.112     NaN
    2023-08-01 10:32:01.112     NaN
    2023-08-01 10:32:04.117     NaN
    2023-08-01 10:32:07.095     NaN
    2023-08-01 10:33:07.095     NaN
    dtype: float64