python machine-learning scikit-learn probability calibration

understanding sklearn calibratedClassifierCV

Hi all I am having trouble understanding how to use the output of sklearn.calibration.CalibratedClassifierCV.

I have calibrated my binary classifier using this method, and results are greatly improved. However I am not sure how to interpret the results. sklearn guide states that, after calibration,

the output of predict_proba method can be directly interpreted as a confidence level. For instance, a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to 0.8, approximately 80% actually belong to the positive class.

Now I would like to reduce false positive by applying a cutoff at .6 for the model to predict label True. Without the calibration, I would have simply used my_model.predict_proba() > .6. However, it seems that after calibration the meaning of predict_proba has changed, so I am not sure if I can do that anymore.

From a quick testing it seems that predict and predict_proba follow the same logic I would expect before calibration. The output of:

pred = my_model.predict(valid_x)
proba= my_model.predict_proba(valid_x)
pd.DataFrame({"label": pred, "proba": proba[:,1]})

is the following:

Where everything that has a probability of above .5 gets to be classifed as True, and everything below .5 as False.

Can you confirm that, after calibration, I can still use predict_proba to apply a different cutoff to identify my labels?

2 https://scikit-learn.org/stable/modules/calibration.html#calibration

Solution

For me, you can actually use predict_proba() after calibration to apply a different cutoff.

What happens within class CalibratedClassifierCV (as you noticed) is effectively that the output of predict() is based on the output of predict_proba() (see here for reference), i.e. np.argmax(self.predict_proba(X), axis=1) == self.predict(X).

On the other side, for the non-calibrated classifier that you're passing to CalibratedClassifierCV (depending on whether it is a probabilistic classifier or not) the above equality may or may not hold (e.g. it does not for an SVC() classifier - see here, for instance, for some other details on this).