As raw data we have measurements m_{i,j}
, measured every 30 seconds (i=0, 30, 60, 90,...720,..
) for every subject j
in the dataset.
I wish use TSFRESH (package) to extract time-series features, such that for a point of interest at time i
, features are calculated based on symmetric rolling window.
We wish to calculate the feature vector of time point i,j
based on measurements of 3 hours of context before i
and 3 hours after i
.
Thus, the 721-dim feature vector represents a point of interest surrounded by 6 hours “context”, i.e. 360 measurements before and 360 measurements after the point of interest.
For every point of interest, features should be extracted based on 721 measurements of m_{i,j}
.
I've tried using rolling_direction
param in roll_time_series()
, but the only options are either roll backwards or forwards in “time” - I'm looking for a way to include both "past" and "future" data in features calculation.
A "workaround" solution:
Use the "roll_time_series
" function twice; one for "backward" rolling (setting rolling_direction=1
) and the second for "forward" (rolling_direction=-1
), and then combine them into one.
This will provide, for each time point in the original dataset m_{i,j}
$, a time series rolling object with 360 values "from the past" and 360 values "from the future" (i.e., the time point is at the center of the window and max_timeshift=360
)
Note to the use of pandas
functions below: concat(), sort_values(), drop_duplicates()
- which are mandatory for this solution to work.
import numpy as np
import pandas as pd
from tsfresh.utilities.dataframe_functions import roll_time_series
from tsfresh.feature_extraction import EfficientFCParameters, MinimalFCParameters
rolled_backward = roll_time_series(activity_data,
column_id=id_column,
column_sort=sort_column,
column_kind=None,
rolling_direction=1,
max_timeshift=360)
rolled_farward = roll_time_series(activity_data,
column_id=id_column,
column_sort=sort_column,
column_kind=None,
rolling_direction=-1,
max_timeshift=360)
# merge into one dataframe, with rolled_farward and rolled_backward window for every time point (sample)
df = pd.concat([rolled_backward, rolled_farward])
# important! - sort and drop duplicates
df.sort_values(by=[id_column, sort_column], inplace=True)
df.drop_duplicates(subset=[id_column, sort_column, 'activity'], inplace=True, keep='first')