pythonmachine-learningfeaturetools

Proper way to pass cutoff_time to dfs in featuretools 1.0.0


Recently I've updated featuretools to v1.0.0 and faced the following issue. I have instances that vary within time and I want to build time dependent features for them. Besides, I want to save some historical characteristics of those instances. So my cutoff time dataset consists of such columns as: time, instance_id and feature1, feature2, ..., target

When I tried to to run dfs, I got the error 'NoneType' object has no attribute 'logical_types'

I have found out that it is caused by the inner function get_ww_types_from_features

It tries to get the column types of cutoff time df, assuming it has woodwork type

        cutoff_schema = cutoff_time.ww.schema
        for column in pass_columns:
            logical_types[column] = cutoff_schema.logical_types[column]
            semantic_tags[column] = cutoff_schema.semantic_tags[column]
            origins[column] = "base"

But originally cutoff time is pandas DataFrame and I haven't found the place in code where it is translated into woodwork. And also it is said in the documentation that it is ok to pass cutoff time as pandas DataFrame

As a result, my question is: what is a proper way to pass cutoff time DataFrame (if it is pandas then is where a mistake in code?) (or if there is no mistake, then should I transform cutoff time to wood work manually in code before dfs?)


Solution

  • There is a bug in Featuretools 1.0.0 when using a dataframe for cutoff_time that has additional columns (e.g. labels) and using multiple workers via the n_jobs or dask_kwargs options. This may be the issue you are encountering. This bug has been fixed in Featuretools 1.1.0

    A pandas df is meant to be accepted for cutoff_time in 1.0.0