Code example:
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
# Normalized one more time
es = es.normalize_entity(
new_entity_id="device",
base_entity_id="sessions",
index="device",
)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_entity="customers",
agg_primitives=["std",],
groupby_trans_primitives=['cum_count'],
max_depth=2
)
And I'd like to look into STD(sessions.CUM_COUNT(device) by customer_id)
feature deeper:
And I tried to generate this feature manually but had different result:
df = ft.demo.load_mock_customer(return_single_table=True)
a = df.groupby("customer_id")['device'].cumcount()
a.name = "cumcount_device"
a = pd.concat([df, a], axis=1)
b = a.groupby("customer_id")['cumcount_device'].std()
>>> b
customer_id
1 36.517
2 26.991
3 26.991
4 31.610
5 22.949
Name: cumcount_device, dtype: float64
What am I missing?
Thanks for the question. The calculation needs to be based on the data frame from sessions.
df = es['sessions'].df
cumcount = df['device'].groupby(df['customer_id']).cumcount()
std = cumcount.groupby(df['customer_id']).std()
std.round(3).loc[feature_matrix.index]
customer_id
5 1.871
4 2.449
1 2.449
3 1.871
2 2.160
dtype: float64
You should get the same output as in DFS.