I am looking for a special window function in pandas: sort of a combination of rolling and expanding. For calculating (for instance) the mean and standard deviating, I want to regard all past data, but ignore the first few records to make sure I have a multiple of 7 (days in my case). That's because I know the data has a strong weekly pattern.
Example:
s = pd.Series([1, 3, 4, 5, 4, 3, 1, 2, 4, 5, 4, 5, 4, 2, 1, 3, 4, 5, 4, 3, 1, 3],
pd.date_range('2020-01-01', '2020-01-22'))
s.rolling(7, 7).mean() # Use last 7 days.
s.expanding(7).mean() # Use all past days.
s.mywindowing(7).mean() # Use last past multiple of 7 days. How?
The effect should be like this:
Of course I can do things manually using for
loops and such, but I imagine the existing pandas machinery can be used to do this...?
another usage here
import pandas as pd
import numpy as np
from pandas.api.indexers import BaseIndexer
from typing import Optional, Tuple
class CustomIndexer(BaseIndexer):
def get_window_bounds(self,
num_values: int = 0,
min_periods: Optional[int] = None,
center: Optional[bool] = None,
closed: Optional[str] = None
) -> Tuple[np.ndarray, np.ndarray]:
end = np.arange(1, num_values+1, dtype=np.int64)
start = end % 7
return start, end
indexer = CustomIndexer(num_values=len(s))
s.rolling(indexer).mean().round(2)
Outputs:
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 NaN
2020-01-05 NaN
2020-01-06 NaN
2020-01-07 3.00
2020-01-08 3.14
2020-01-09 3.29
2020-01-10 3.43
2020-01-11 3.29
2020-01-12 3.43
2020-01-13 3.57
2020-01-14 3.36
2020-01-15 3.36
2020-01-16 3.36
2020-01-17 3.36
2020-01-18 3.36
2020-01-19 3.36
2020-01-20 3.36
2020-01-21 3.24
2020-01-22 3.33
Freq: D, dtype: float64