Could you use a window function on groups, something in feature engine? I have been reading the docs and trying to find some clarity on how to do this but it seems like something that should exist but I can't seem to find how its implemented.
import pandas as pd
# create a sample dataframe with groups
df = pd.DataFrame({'group': ['A', 'A','A', 'B', 'B', 'B','B', 'C', 'C', 'C','C'],
'value': [1, 2, 3, 4, 5, 6, 7, 8,9,10,11]})
# group the data by the 'group' column and apply a rolling window mean of size 2
rolling_mean = df.groupby('group')['value'].rolling(window=2).mean()
print(rolling_mean)
I am guessing it would look something like this.
from feature_engine.timeseries.forecasting import WindowFeatures
wf = WindowFeatures(
window_size=3,
variables=["value"],
operation=["mean"],
groupby_cols=["group"]
)
transformed_df = wf.fit_transform(df)
I can't seem to find a group_by (groupby_cols) parameter in feature-engine?
It would be great to see other ways of standardising feature engineering for time series data like this, perhaps from sktime or any other framework too.
As you want to apply this operation individually for each group, you can use groupby_apply
:
wf = WindowFeatures(window=3, variables=["value"], functions=["mean"])
# same as pd.concat([wf.fit_transform(X) for _, X in df.groupby('group')])
out = df.groupby('group', group_keys=False).apply(wf.fit_transform)
Output:
>>> out
group value value_window_3_mean
0 A 1 NaN
1 A 2 NaN
2 A 3 NaN
3 B 4 NaN
4 B 5 NaN
5 B 6 NaN
6 B 7 5.0
7 C 8 NaN
8 C 9 NaN
9 C 10 NaN
10 C 11 9.0