I'm working on a binary classification problem and I have an sgd classifier like so:
sgd = SGDClassifier(
max_iter = 1000,
tol = 1e-3,
validation_fraction = 0.2,
class_weight = {0:0.5, 1:8.99}
)
I fitted it on my training set and plotted the precision-recall curve:
from sklearn.metrics import plot_precision_recall_curve
disp = plot_precision_recall_curve(sgd, X_test, y_test)
Given that the sgd classifier in scikit-learn uses loss="hinge"
by default, how is it possible for this curve to be plotted? My understanding is that the output of the sgd is not probabilistic -- it's either 1/0. So there are no "thresholds", and yet the sklearn precision-recall curve plots a zigzagged graph with different kinds of thresholds. What's going on here?
The situation you describe is practically identical with one found in a documentation example, using the first 2 classes of the iris data and a LinearSVC classifier (the algorithm uses the squared hinge loss, which, like the hinge loss you use here, results in a classifier that produces only binary outcomes and not probabilistic ones). The resulting plot there is:
i.e. qualitatively similar to yours here.
Nevertheless, your question is a legitimate one and a nice catch indeed; how comes and we get a behavior similar to one produced by probabilistic classifiers, when our classifier does not indeed produce probabilistic predictions (and hence any notion of a threshold sounds irrelevant)?
To see why this is so, we need to do some digging into the scikit-learn source code, starting from the plot_precision_recall_curve
function used here and following the thread down into the rabbit hole...
Starting from the source code of plot_precision_recall_curve
, we find:
y_pred, pos_label = _get_response(
X, estimator, response_method, pos_label=pos_label)
So, for the purposes of plotting the PR curve, the predictions y_pred
are not produced directly by the predict
method of our classifier, but by the _get_response()
internal function of scikit-learn.
_get_response()
in turn includes the lines:
prediction_method = _check_classifier_response_method(
estimator, response_method)
y_pred = prediction_method(X)
which finally leads us to the _check_classifier_response_method()
internal function; you can check the full source code of it - what is of interest here are the following 3 lines after the else
statement:
predict_proba = getattr(estimator, 'predict_proba', None)
decision_function = getattr(estimator, 'decision_function', None)
prediction_method = predict_proba or decision_function
By now, you may have started getting the point: under the hood, plot_precision_recall_curve
checks if either a predict_proba()
or a decision_function()
method is available for the classifier used; and if a predict_proba()
is not available, like your case here of an SGDClassifier with hinge loss (or the documentation example of a LinearSVC classifier with squared hinge loss), it reverts to the decision_function()
method instead, in order to calculate the y_pred
which will be subsequently used for plotting the PR (and ROC) curve.
The above have arguably answered your programming question about how exactly scikit-learn produces the plot and the underlying calculations in such cases; further theoretical inquiries regarding if & why using the decision_function()
of a non-probabilistic classifier is indeed a correct and legitimate approach to get a PR (or ROC) curve are out of scope for SO, and they should be addressed to Cross Validated, if necessary.