I have a text data frame that I want to classify. But I need to do oversampling first. Please find sample data below:
df=[['I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am going to class today','I am not going to class today','I am not going to class today','I am not going to class today','I am not going to class today'],['Positive','Positive','Positive','Positive','Positive','Positive','Positive','Positive','Positive','Positive','Negative','Negative','Negative','Negative']]
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['Features','Class']
df
Features Class
0 I am going to class today Positive
1 I am going to class today Positive
2 I am going to class today Positive
3 I am going to class today Positive
4 I am going to class today Positive
5 I am going to class today Positive
6 I am going to class today Positive
7 I am going to class today Positive
8 I am going to class today Positive
9 I am going to class today Positive
10 I am not going to class today Negative
11 I am not going to class today Negative
12 I am not going to class today Negative
13 I am not going to class today Negative
oversample = RandomOverSampler(sampling_strategy='minority')
# fit and apply the transform
X_over, y_over = oversample.fit_resample(df['Features'], df['Class'])
# summarize class distribution
print(Counter(y_over))
But this is not working and giving me ValueError: Expected 2D array, got 1D array instead:
. How can I oversample this data?
I found the problem. I needed to reshape my data.
X_over, y_over = oversample.fit_resample(df['Features'].values.reshape(-1,1), df['Class'])
This is working now.
Counter({'Positive': 10, 'Negative': 10})