For one of my datasets, I have a data imbalance problem as the minority class has very few samples compared to the majority class. So I want to balance the data by undersampling the majority class. When I am trying to use RandomUnderSamples from imblearn package on a 3D array and I have an error
ValueError: Found array with dim 3. Estimator expected <= 2.
The features in the data which are in 3D format
train['X'].shape
(276216, 101, 4)
The input labels
train['y'].shape
(276216, 1)
When I try to randomly undersample data when I run this
from imblearn.under_sampling import RandomUnderSampler
undersample = RandomUnderSampler(sampling_strategy='majority')
X_train_under, y_train_under = undersample.fit(train['X'], train['y'])
I get the above error. Any help would be appreciated.
The function expects 2D arrays to be passed as arguments. Reshape your data and you'll be fine. Also, you will have to call fit_resample
as per docs.
X = train['X'].reshape(train['X'].shape[0], -1)
X_train_under, y_train_under = undersample.fit_resample(X, train['y'])