pythonpandaswindowing

Window of full weeks in pandas


I am looking for a special window function in pandas: sort of a combination of rolling and expanding. For calculating (for instance) the mean and standard deviating, I want to regard all past data, but ignore the first few records to make sure I have a multiple of 7 (days in my case). That's because I know the data has a strong weekly pattern.

Example:

s = pd.Series([1, 3, 4, 5, 4, 3, 1, 2, 4, 5, 4, 5, 4, 2, 1, 3, 4, 5, 4, 3, 1, 3],
              pd.date_range('2020-01-01', '2020-01-22'))
s.rolling(7, 7).mean()   # Use last 7 days.
s.expanding(7).mean()    # Use all past days.
s.mywindowing(7).mean()  # Use last past multiple of 7 days. How?

The effect should be like this: Manual exercise in Excel

Of course I can do things manually using for loops and such, but I imagine the existing pandas machinery can be used to do this...?


Solution

  • Pandas custom window rolling

    another usage here

    import pandas as pd
    import numpy as np
    from pandas.api.indexers import BaseIndexer
    from typing import Optional, Tuple
    
    
    class CustomIndexer(BaseIndexer):
        def get_window_bounds(self,
                              num_values: int = 0,
                              min_periods: Optional[int] = None,
                              center: Optional[bool] = None,
                              closed: Optional[str] = None
                              ) -> Tuple[np.ndarray, np.ndarray]:
    
            end = np.arange(1, num_values+1, dtype=np.int64)
            start = end % 7
            return start, end
    
    indexer = CustomIndexer(num_values=len(s))
    s.rolling(indexer).mean().round(2)
    

    Outputs:

    2020-01-01     NaN
    2020-01-02     NaN
    2020-01-03     NaN
    2020-01-04     NaN
    2020-01-05     NaN
    2020-01-06     NaN
    2020-01-07    3.00
    2020-01-08    3.14
    2020-01-09    3.29
    2020-01-10    3.43
    2020-01-11    3.29
    2020-01-12    3.43
    2020-01-13    3.57
    2020-01-14    3.36
    2020-01-15    3.36
    2020-01-16    3.36
    2020-01-17    3.36
    2020-01-18    3.36
    2020-01-19    3.36
    2020-01-20    3.36
    2020-01-21    3.24
    2020-01-22    3.33
    Freq: D, dtype: float64