featuretools

Do `normalize_entity()`, `add_relationships()` are logically same in featuretools?


Example:

buy_log_df = pd.DataFrame(
    [
        ["2020-01-01", 0, 1, 2, 2, 200],
        ["2020-01-02", 1, 1, 1, 3, 100],
        ["2020-01-02", 2, 2, 1, 1, 100],
        ["2020-01-03", 3, 3, 3, 1, 300],
    ],
    columns=['date', 'sale_id', 'customer_id', "item_id", "quantity", "price"]
)

es = ft.EntitySet(id="sale_set")
es = es.entity_from_dataframe(
    "sales",
    dataframe=buy_log_df,
    index="sale_id",
    time_index='date'
)
es = es.normalize_entity(
    new_entity_id="items",
    base_entity_id="sales",
    index="item_id",
    additional_variables=["price"],
)
buy_log_df = pd.DataFrame(
    [
        ["2020-01-01", 0, 1, 2, 2],
        ["2020-01-02", 1, 1, 1, 3],
        ["2020-01-02", 2, 2, 1, 1],
        ["2020-01-03", 3, 3, 3, 1],
    ],
    columns=['date', 'sale_id', 'customer_id', "item_id", "quantity",]
)
item_df = pd.DataFrame(
    [
        [1, 100],
        [2, 200],
        [3, 300],
    ],
    columns=['item_id', 'price']
)

es = ft.EntitySet(id="sale_set")
es = es.entity_from_dataframe(
    "sales",
    dataframe=buy_log_df,
    index="sale_id",
    time_index='date'
)
es = es.entity_from_dataframe(
    "items",
    dataframe=item_df,
    index="item_id",
)
from featuretools import Relationship
es = es.add_relationships(
    [Relationship(es['items']['item_id'], es['sales']['item_id'])],
)

It looks like the es of the above two are the same.

I'd like to know whether there is a specific case where ONLY normalize_entity() is allowed or so.


Solution

  • Thanks for the question. That's correct. The two entity sets are the same. There aren't cases where only normalize_entity() can be used. Changes made by this method such as adding relationships can also be done manually.