pythonscikit-learnimbalanced-dataimblearn

AttributeError: 'SMOTE' object has no attribute '_validate_data'


I'm resampling my data (multiclass) by using SMOTE.

sm = SMOTE(random_state=1)
X_res, Y_res = sm.fit_resample(X_train, Y_train)

However, I'm getting this attribute error. Can anyone help?


Solution

  • Short answer

    You need to upgrade scikit-learn to version 0.23.1.

    Long answer

    The newest version 0.7.0 of imbalanced-learn seems to have an undocumented dependency on scikit-learn v0.23.1. It would give you AttributeError: 'SMOTE' object has no attribute '_validate_data' if your scikit-learnis 0.22 or below.

    If you are using Anaconda, installing scikit-learn version 0.23.1 might be tricky. conda update scikit-learn might not update scikit-learn version 0.23 or higher because the newest scikit-learn version Conda has at this point of time is 0.22.1. If you try to install it using conda install scikit-learn=0.23.1 or pip install scikit-learn==0.23.1, you will get tons of compatibility checks and installation might not be quick. Therefore the easiest way to install scikit-learn version 0.23.1 in Anaconda is to create a new virtual environment with minimum packages so that there are less or no conflict issues. Then, in the new virtual environment install scikit-learn version 0.23.1 followed by version 0.7.0 of imbalanced-learn.

    conda create -n test python=3.7.6
    conda activate test
    pip install scikit-learn==0.23.1
    pip install imbalanced-learn==0.7.0
    

    Finally, you need to reinstall your IDE in the new virtual environment in order to use these packages.

    However, once scikit-learn version 0.23.1 becomes available in Conda and there are no compatibility issues, you can install it in the base environment directly.