pythonnumpyscikit-learnnumpy-ndarray

'numpy.ndarray' object has no attribute 'groupby'


I am trying to apply target encoding to categorical features using the category_encoders.TargetEncoder in Python. However, I keep getting the following error:

AttributeError: 'numpy.ndarray' object has no attribute 'groupby'
from category_encoders import TargetEncoder
from sklearn.model_selection import train_test_split

# Features for target encoding
encoding_cols = ['grade', 'sub_grade', 'home_ownership', 'verification_status', 
                 'purpose', 'application_type', 'zipcode']

# Train-Test Split
X_train_cv, X_test, y_train_cv, y_test = train_test_split(x, y, test_size=0.25, random_state=1)
X_train, X_test_cv, y_train, y_test_cv = train_test_split(X_train_cv, y_train_cv, test_size=0.25, random_state=1)

# Initialize the Target Encoder
encoder = TargetEncoder()

# Apply Target Encoding
for i in encoding_cols:
    X_train[i] = encoder.fit_transform(X_train[i], y_train)  # **Error occurs here**
    X_test_cv[i] = encoder.transform(X_test_cv[i])
    X_test[i] = encoder.transform(X_test[i])

want to successfully apply target encoding to the categorical columns without encountering the 'numpy.ndarray' object has no attribute 'groupby' error.


Solution

  • This is interesting. I can reproduce your error.

    It is related to the dtype. To solve the issue you need to force a conversion using its list values and set the name and index explicitly.

    y_train = pd.Series(y_train.tolist(), name='loan_status', index=y_train.index)

    This will convert your initial dtype of CategoricalDtype(categories=[1, 0], ordered=False, categories_dtype=int64) to dtype('int64')

    So you last cell in the Colab is now:

    # Initialize TargetEncoder
    encoder = ce.TargetEncoder(cols=encoding_cols)
    
    # Here is the list conversion and back to series
    y_train = pd.Series(y_train.tolist(), index=y_train.index)
    
    # Fit and transform the training data
    X_train = encoder.fit_transform(X_train, y_train)
    

    and this works fine.