I have a dataset which I would like to conduct automatic feature engineering on. However it is time series based, so in order to make it work I have to use 2 things as ids, the object id and the date.
x = pd.DataFrame({'id': [1,2,1], 'date': [2012021,2032021,4052021], 'x1': [1,2,3]})
y = pd.DataFrame({'id': [1,2,1], 'date': [2012021,2032021,4052021], 'label': [3,2,1]})
entities = {"features": (x, ['id','date']), "labels": (y, ['id','date'])}
feature_matrix, features_defs = ft.dfs(entities=entities,target_entity="y")
When I run this I get this error:
TypeError: unhashable type: 'list'
How do I fix this?
You are right, but here, you should create unique index for entity set and then use the right one (id
) in dfs
. I would recommend this way:
data = pd.DataFrame({'id': [1,2,1], 'date': [2012021,2032021,4052021], 'x1': [1,2,3], 'label': [3,2,1]})
data['index'] = data.index
es = ft.EntitySet('My EntitySet')
es.entity_from_dataframe(
entity_id='main_data',
dataframe=data,
index='index',
time_index='date'
)
es.normalize_entity(
base_entity_id='main_data',
new_entity_id='observations',
index='id',
make_time_index=True
)
feature_matrix, features_defs = ft.dfs(entityset=es, target_entity="main_data")
There might be another or even better way how to deal with this, check this github question or this SO answer.