I created Custom Primitives like below.
class Correlate(TransformPrimitive):
name = 'correlate'
input_types = [Numeric,Numeric]
return_type = Numeric
commutative = True
compatibility = [Library.PANDAS, Library.DASK, Library.KOALAS]
def get_function(self):
def correlate(column1,column2):
return np.correlate(column1,column2,"same")
return correlate
Then I checked the calculation like below just in case.
np.correlate(feature_matrix["alcohol"], feature_matrix["chlorides"],mode="same")
However above function result and below function result were difference.
Do you know why those are difference?
If my code is wrong basically, please correct me.
Thanks for the question! You can create a custom primitive with a fixed argument to calculate that kind of correlation by using the TransformPrimitive
as a base class. I will go through an example using this data.
import pandas as pd
data = [
[0.40168819, 0.0857946],
[0.06268886, 0.27811651],
[0.16931269, 0.96509497],
[0.15123022, 0.80546244],
[0.58610794, 0.56928692],
]
df = pd.DataFrame(data=data, columns=list('ab'))
df.reset_index(inplace=True)
df
index a b
0 0.401688 0.085795
1 0.062689 0.278117
2 0.169313 0.965095
3 0.151230 0.805462
4 0.586108 0.569287
The function np.correlate
is a transform when the parameter mode=same
, so define a custom primitive by using the TransformPrimitive
as a base class.
from featuretools.primitives import TransformPrimitive
from featuretools.variable_types import Numeric
import numpy as np
class Correlate(TransformPrimitive):
name = 'correlate'
input_types = [Numeric, Numeric]
return_type = Numeric
def get_function(self):
def correlate(a, b):
return np.correlate(a, b, mode='same')
return correlate
The DFS call requires the data to be structured into an EntitySet
, then you can use the custom primitive.
import featuretools as ft
es = ft.EntitySet()
es.entity_from_dataframe(
entity_id='data',
dataframe=df,
index='index',
)
fm, fd = ft.dfs(
entityset=es,
target_entity='data',
trans_primitives=[Correlate],
max_depth=1,
)
fm[['CORRELATE(a, b)']]
CORRELATE(a, b)
index
0 0.534548
1 0.394685
2 0.670774
3 0.670506
4 0.622236
You should get the same values between the feature matrix and np.correlate
.
actual = fm['CORRELATE(a, b)'].values
expected = np.correlate(df['a'], df['b'], mode='same')
np.testing.assert_array_equal(actual, expected)
You can learn more about defining simple custom primitives and advanced custom primitives in the linked pages. Let me know if you found this helpful.