I have a binary classification task, where I fit the model using XGBClassifier classifier and try to predict ’1’ and ‘0’ using the test set. In this task I have a very unbalanced data majority ‘0‘ and minority ‘1’ at training data (of coarse the same in the test set). My data looks like this:
F1 F2 F3 …. Target
S1 2 4 5 …. 0
S2 2.3 4.3 6.4 1
… … … …. ..
S4000 3 6 7 0
I used the following code to train the model and calculate the roc value:
my_cls=XGBClassifier()
X=mydata_train.drop(['target'])
y= mydata_train['target']
x_tst=mydata_test.drop['target']
y_tst= mydata_test['target']
my_cls.fit(X, y)
pred= my_cls.predict_proba(x_tst)[:,1]
auc_score=roc_auc_score(y_tst,pred)
The above code gives me a value as auc_score, but it seems this value is for one class using this my_cls.predict_proba(x_tst)[:,1], If I change it to my_cls.predict_proba(x_tst)[:,0], it gives me another value as auc value. My first question is how can I directly get the weighted average for auc? My second question is how to select the right cut point to build the confusion matrix having the unbalanced data? This is because by default the classifier uses 50% as the threshold to build the matrix, but since my data is very unbalanced it seems we need to select a right threshold. I need to count TP and FP thats why I need to have this cut point.
If I use weight class to train the model, does it handle the problem (I mean can I use the 50% cut point by default)? For example some thing like this:
My_clss_weight=len(X) / (2 * np.bincount(y))
Then try to fit the model with this:
my_cls.fit(X, y, class_weight= My_clss_weight)
However the above code my_cls.fit(X, y, class_weight= My_clss_weight) does not work with XGBClassifier and gives me error. This works with LogessticRegression, but I want to apply with XGBClassifier! any idea to handle the issues?
To answer your first question, you can simply use the parameter weighted of the roc_auc_score function.
For example -
roc_auc_score(y_test, pred, average = 'weighted')
To answer the second half of your question, can you please elaborate a bit. I can help you with that.