pythonmachine-learningscikit-learnimblearn

ImportError: cannot import name '_get_column_indices' from 'sklearn.utils'


I am getting an import Error when trying to import imblearn.over_sampling for RandomOverSampler. I believe the issue is not with my code but with the libraries clashing, I'm not sure though.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import StandardScaler    #actually scikit-learn
from imblearn.over_sampling import RandomOverSampler

code that's using StandardScaler and RandomOverSampler:

def scale_dataset(dataframe, oversample=False):
    X = dataframe[dataframe.columns[:-1]].values
    Y = dataframe[dataframe.columns[-1]].values

    scaler = StandardScaler() 
    X = scaler.fit_transform(X) 

    if oversample:
        ros = RandomOverSampler()
        X, Y = ros.fit_resample(X,Y) 
    data = np.hstack((X, np.reshape(Y, (-1, 1))))
    return data, X, Y

print(len(train[train["class"]==1]))
print(len(train[train["class"]==0]))

train, X_train, Y_train = scale_dataset(train, True)

I tried fully importing sklearn, uninstalled and reinstalled scipi and sklearn (as scikit-learn), installing Tensorflow. I do have numpy, scipy, pandas and other dependent libraries installed.


Solution

  • This is a known issue (https://github.com/scikit-learn-contrib/imbalanced-learn/issues/1081#issuecomment-2127245933). You can either

    pip install git+https://github.com/scikit-learn-contrib/imbalanced-learn.git@master
    

    or downgrade scikit-learn to 1.5.