[SOLVED] Using multiple Ids in featuretools

Using multiple Ids in featuretools

I have a dataset which I would like to conduct automatic feature engineering on. However it is time series based, so in order to make it work I have to use 2 things as ids, the object id and the date.

x = pd.DataFrame({'id': [1,2,1], 'date': [2012021,2032021,4052021], 'x1': [1,2,3]})
y = pd.DataFrame({'id': [1,2,1], 'date': [2012021,2032021,4052021], 'label': [3,2,1]})
entities = {"features": (x, ['id','date']), "labels": (y, ['id','date'])}
feature_matrix, features_defs = ft.dfs(entities=entities,target_entity="y")

When I run this I get this error:

TypeError: unhashable type: 'list'

How do I fix this?

Solution

You are right, but here, you should create unique index for entity set and then use the right one (id) in dfs. I would recommend this way:

Create single dataframe instead of two

data = pd.DataFrame({'id': [1,2,1], 'date': [2012021,2032021,4052021], 'x1': [1,2,3], 'label': [3,2,1]})

Add unique index to column

data['index'] = data.index

Create entity set

es = ft.EntitySet('My EntitySet')

Create entity from dataframe (not using two kinds of indexes)

es.entity_from_dataframe(
    entity_id='main_data',
    dataframe=data,
    index='index',
    time_index='date'
)

Normalize it

es.normalize_entity(
    base_entity_id='main_data',
    new_entity_id='observations',
    index='id',
    make_time_index=True
)

Create features (don't forget to set e.g. aggregation if you do not want to use the default setting)

feature_matrix, features_defs = ft.dfs(entityset=es, target_entity="main_data")

There might be another or even better way how to deal with this, check this github question or this SO answer.