I have an imbalanced dataset, I am balancing it using the SMOTE algorithm. After oversampling when I print the confusion matrix it showed me the following output:
Support: '0' 91 Support: '1' 209
I have a dataset of 1000 labels 1 occurs 700 times and label 0 occurs 300 times, I am using 0.3 for test data, but Why its showing me support 91 and 209?
Even If I do not apply the SMOTE algorithm then it is showing me 91 and 209 and after applying SMOTE it is the same.
Clarification
First, this is not a confusion matrix, this is a classification report. It regroups the metrics that can be calculated from a confusion matrix.
Smote purpose
Secondly, you apply SMOTE
to your trainset only. Therefore, if you are following someone's code, they would have trained their model on the train data oversampled with SMOTE. However, the testing is done on the original data (which is logical).
Your purpose of using SMOTE in your trainset is to improve its imbalance. Once the model has learned the supposedly better
weights from the newly oversampled
data, you proceed by testing on the test data you have split when you did train_test_split(X,y,test_size=0.3)
.
Code
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)
X_train_sampled , y_train_sampled = sm.fit_sample(X_train,y_train.ravel())
model.fit(X_train_sampled,y_train_sampled)
model.predict(X_test)