How can I select top n features of time series using tsfresh? Can I decide the number of top features I want to extract?
Based on the above comment from @Chaitra and this answer I give an answer.
You can decide the number of top features by using the tsfresh
relevance table described in the documentation. You can then sort the table by the p-value and the the top n
features.
Example code printing top 11 features:
from tsfresh import extract_features
from tsfresh.feature_selection.relevance import calculate_relevance_table
extracted_features = extract_features(
X,
column_id="id",
column_kind="kind",
column_value="value",
)
relevance_table = calculate_relevance_table(extracted_features, y)
relevance_table = relevance_table[relevance_table.relevant]
relevance_table.sort_values("p_value", inplace=True)
print(relevance_table["feature"][:11])