As of version v0.12.0, FeatureTools allows you to assign custom names to multi-output primitives: https://github.com/alteryx/featuretools/pull/794. By default, the when you define custom multi-output primitives, the column names for the generated features are appended with a [0]
, [1]
, [2]
, etc. So let us say that I have the following code to output a multi-output primitive:
def sine_and_cosine_datestamp(column):
"""
Returns the Sin and Cos of the hour of datestamp
"""
sine_hour = np.sin(column.dt.hour)
cosine_hour = np.cos(column.dt.hour)
ret = [sine_hour, cosine_hour]
return ret
Sine_Cosine_Datestamp = make_trans_primitive(function = sine_and_cosine_datestamp,
input_types = [vtypes.Datetime],
return_type = vtypes.Numeric,
number_output_features = 2)
In the dataframe generated from DFS, the names of the two generated columns will be SINE_AND_COSINE_DATESTAMP(datestamp)[0]
and SINE_AND_COSINE_DATESTAMP(datestamp)[1]
. In actuality, I would have liked the names of the columns to reflect the operations being taken on the column. So I would have liked the column names to be something like SINE_AND_COSINE_DATESTAMP(datestamp)[sine]
and SINE_AND_COSINE_DATESTAMP(datestamp)[cosine]
. Apparently you have to use the generate_names
method in order to do so. I could not find anything online to help me use this method and I kept running into errors. For example, when I tried the following code:
def sine_and_cosine_datestamp(column, string = ['sine, cosine']):
"""
Returns the Sin and Cos of the hour of the datestamp
"""
sine_hour = np.sin(column.dt.hour)
cosine_hour = np.cos(column.dt.hour)
ret = [sine_hour, cosine_hour]
return ret
def sine_and_cosine_generate_names(self, base_feature_names):
return u'STRING_COUNT(%s, "%s")' % (base_feature_names[0], self.kwargs['string'])
Sine_Cosine_Datestamp = make_trans_primitive(function = sine_and_cosine_datestamp,
input_types = [vtypes.Datetime],
return_type = vtypes.Numeric,
number_output_features = 2,
description = "For each value in the base feature"
"outputs the sine and cosine of the hour, day, and month.",
cls_attributes = {'generate_names': sine_and_cosine_generate_names})
I had gotten an assertion error. What's even more perplexing to me is that when I went into the transform_primitve_base.py
file found in the featuretools/primitives/base
folder, I saw that the generate_names
function looks like this:
def generate_names(self, base_feature_names):
n = self.number_output_features
base_name = self.generate_name(base_feature_names)
return [base_name + "[%s]" % i for i in range(n)]
In the function above, it looks like there is no way that you can generate custom primitive names since it uses the base_feature_names
and the number of output features by default. Any help would be appreciated.
Thanks for the question! This feature hasn't been documented well.
The main issue with your code was that string_count_generate_name
should return a list of strings, one for each column.
It looks like you were adapting the StringCount
example from the docs -- I think for this primitive it would be less error-prone to always use "sine" and "cosine" for the custom names, and remove the optional string
argument from sine_and_cosine_datestamp
. I also updated the feature name text to match your desired text.
After these changes:
def sine_and_cosine_datestamp(column):
"""
Returns the Sin and Cos of the hour of the datestamp
"""
sine_hour = np.sin(column.dt.hour)
cosine_hour = np.cos(column.dt.hour)
ret = [sine_hour, cosine_hour]
return ret
def sine_and_cosine_generate_names(self, base_feature_names):
template = 'SINE_AND_COSINE_DATESTAMP(%s)[%s]'
return [template % (base_feature_names[0], string) for string in ['sine', 'cosine']]
This created feature column names like SINE_AND_COSINE_DATESTAMP(order_date)[sine]
. No changes were necessary to the actual make_trans_primitive
call.
In the function above, it looks like there is no way that you can generate custom primitive names since it uses the base_feature_names and the number of output features by default.
That is the default generate_names
function for transform primitives. Since we are assigning this custom generate names function to Sine_Cosine_Datestamp
, the default will not be used.
Hope that helps, let me know if you still have questions!