featuretools

featuretools: how can I apply `time_since`, `time_since_first` primitives on integer type of time index?


When the time index is integer(e.g. starting from 0 for each user), running dfs shows warnings:

UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
  agg_primitives: ['avg_time_between', 'time_since_first', 'time_since_last', 'trend']
  groupby_trans_primitives: ['cum_count', 'time_since', 'time_since_previous']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.

However, the timeindex can be an integer in many cases (e.g. https://www.kaggle.com/c/riiid-test-answer-prediction/data):

In this case, even though I set the timestamp variable as ft.variable_types.TimeIndex(numeric_time_index) when creating entityset, it still showed the same warning and features generated by ['avg_time_between', 'time_since_first', 'time_since_last', 'trend'] didn't appear.

How can I handle it?


Solution

  • Thanks for the question. The time_since and time_since_first primitives are currently implemented to handle only Datetime and DatetimeTimeIndex variables. To handle cases where the time index is numeric, you can create custom primitives to handle NumericTimeIndex variables.

    from featuretools.primitives import AggregationPrimitive, TransformPrimitive
    from featuretools.variable_types import NumericTimeIndex
    
    
    class TimeSinceNumeric(TransformPrimitive):
        input_types = [NumericTimeIndex]
        ...
    
    
    class TimeSinceFirstNumeric(AggregationPrimitive):
        input_types = [NumericTimeIndex]
        ...
    

    Then, you can pass in the custom primitives directly to DFS.

    ft.dfs(
        ...
        trans_primitives=[TimeSinceNumeric],
        agg_primitives=[TimeSinceFirstNumeric],
    )