Example:
buy_log_df = pd.DataFrame(
[
["2020-01-01", 0, 1, 2, 2, 200],
["2020-01-02", 1, 1, 1, 3, 100],
["2020-01-02", 2, 2, 1, 1, 100],
["2020-01-03", 3, 3, 3, 1, 300],
],
columns=['date', 'sale_id', 'customer_id', "item_id", "quantity", "price"]
)
es = ft.EntitySet(id="sale_set")
es = es.entity_from_dataframe(
"sales",
dataframe=buy_log_df,
index="sale_id",
time_index='date'
)
es = es.normalize_entity(
new_entity_id="items",
base_entity_id="sales",
index="item_id",
additional_variables=["price"],
)
buy_log_df = pd.DataFrame(
[
["2020-01-01", 0, 1, 2, 2],
["2020-01-02", 1, 1, 1, 3],
["2020-01-02", 2, 2, 1, 1],
["2020-01-03", 3, 3, 3, 1],
],
columns=['date', 'sale_id', 'customer_id', "item_id", "quantity",]
)
item_df = pd.DataFrame(
[
[1, 100],
[2, 200],
[3, 300],
],
columns=['item_id', 'price']
)
es = ft.EntitySet(id="sale_set")
es = es.entity_from_dataframe(
"sales",
dataframe=buy_log_df,
index="sale_id",
time_index='date'
)
es = es.entity_from_dataframe(
"items",
dataframe=item_df,
index="item_id",
)
from featuretools import Relationship
es = es.add_relationships(
[Relationship(es['items']['item_id'], es['sales']['item_id'])],
)
It looks like the es
of the above two are the same.
I'd like to know whether there is a specific case where ONLY normalize_entity()
is allowed or so.
Thanks for the question. That's correct. The two entity sets are the same. There aren't cases where only normalize_entity()
can be used. Changes made by this method such as adding relationships can also be done manually.