I am trying to down-sample a time series in Pandas from 8 seconds to 10 seconds. For the purposes of this example, I've generated fake data that linearly increases with the number of seconds, over a minute. Importantly, for this example, the time intervals of the two time series are not multiples of each other.
When using .resample().interpolate() in Pandas, it seems unable to interpolate for the first few points, for which there is sufficient data. How can I work around it? Here's the example:
import numpy as np
import pandas as pd
import datetime
a = datetime.datetime(2025, 12, 2, 17, 39, 6)
interval8df = pd.DataFrame(np.linspace(60, 124, 9), columns=['Hi'], index=pd.date_range(a, periods=9, freq='8s'))
interval8df['Hi']
2025-12-02 17:39:06 60.0
2025-12-02 17:39:14 68.0
2025-12-02 17:39:22 76.0
2025-12-02 17:39:30 84.0
2025-12-02 17:39:38 92.0
2025-12-02 17:39:46 100.0
2025-12-02 17:39:54 108.0
2025-12-02 17:40:02 116.0
2025-12-02 17:40:10 124.0
Freq: 8s, Name: Hi, dtype: float64
When using resample interpolate, this is the result:
interval8df.resample('10s').interpolate(method='time')['Hi']
2025-12-02 17:39:00 NaN
2025-12-02 17:39:10 NaN
2025-12-02 17:39:20 NaN
2025-12-02 17:39:30 84.0
2025-12-02 17:39:40 94.0
2025-12-02 17:39:50 104.0
2025-12-02 17:40:00 114.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
While I can understand the first 17:39:00 going NaN, both 17:39:10 and 17:39:20 are both surrounded by points in the original time series (6 and 14 seconds, then 14 and 20 seconds respectively). Why is it occurring?
I've tried using mean, but that produced no NaNs.
interval8df.resample('10s').mean()['Hi']
2025-12-02 17:39:00 60.0
2025-12-02 17:39:10 68.0
2025-12-02 17:39:20 76.0
2025-12-02 17:39:30 88.0
2025-12-02 17:39:40 100.0
2025-12-02 17:39:50 108.0
2025-12-02 17:40:00 116.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
Additionally, changing the interpolate method does not seem to have improved the solution.
The workaround I've been using is up-sampling from 8 seconds to 1 second using interpolate, then down-sampling from 1 second to 10 seconds using the mean, which is obviously clunky. I would like to be able to do this directly in one step.
To see what is happening, let's add asfreq
after the resample and you can see what is passed in to the next chained function:
interval8df.resample('10s').asfreq()
Output:
Hi
2025-12-02 17:39:00 NaN
2025-12-02 17:39:10 NaN
2025-12-02 17:39:20 NaN
2025-12-02 17:39:30 84.0
2025-12-02 17:39:40 NaN
2025-12-02 17:39:50 NaN
2025-12-02 17:40:00 NaN
2025-12-02 17:40:10 124.0
And, since you doing interpolation, the lower bound is not seen hence the nulls for seconds 00, 10, 20. While doing mean
with out interpolating you, are just doing a window of 10s means of values. Since you have values within each 10s interval you are getting that mean values returned.