I am trying to apply target encoding to categorical features using the category_encoders.TargetEncoder
in Python. However, I keep getting the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'groupby'
from category_encoders import TargetEncoder
from sklearn.model_selection import train_test_split
# Features for target encoding
encoding_cols = ['grade', 'sub_grade', 'home_ownership', 'verification_status',
'purpose', 'application_type', 'zipcode']
# Train-Test Split
X_train_cv, X_test, y_train_cv, y_test = train_test_split(x, y, test_size=0.25, random_state=1)
X_train, X_test_cv, y_train, y_test_cv = train_test_split(X_train_cv, y_train_cv, test_size=0.25, random_state=1)
# Initialize the Target Encoder
encoder = TargetEncoder()
# Apply Target Encoding
for i in encoding_cols:
X_train[i] = encoder.fit_transform(X_train[i], y_train) # **Error occurs here**
X_test_cv[i] = encoder.transform(X_test_cv[i])
X_test[i] = encoder.transform(X_test[i])
want to successfully apply target encoding to the categorical columns without encountering the 'numpy.ndarray' object has no attribute 'groupby'
error.
This is interesting. I can reproduce your error.
It is related to the dtype
. To solve the issue you need to force a conversion using its list values and set the name and index explicitly.
y_train = pd.Series(y_train.tolist(), name='loan_status', index=y_train.index)
This will convert your initial dtype
of CategoricalDtype(categories=[1, 0], ordered=False, categories_dtype=int64)
to dtype('int64')
So you last cell in the Colab is now:
# Initialize TargetEncoder
encoder = ce.TargetEncoder(cols=encoding_cols)
# Here is the list conversion and back to series
y_train = pd.Series(y_train.tolist(), index=y_train.index)
# Fit and transform the training data
X_train = encoder.fit_transform(X_train, y_train)
and this works fine.