pythonmachine-learningscikit-learnprecision-recallstochastic-gradient

SGD classifier Precision-Recall curve


I'm working on a binary classification problem and I have an sgd classifier like so:

sgd = SGDClassifier(
    max_iter            = 1000, 
    tol                 = 1e-3,
    validation_fraction = 0.2,
    class_weight = {0:0.5, 1:8.99}
)

I fitted it on my training set and plotted the precision-recall curve:

from sklearn.metrics import plot_precision_recall_curve
disp = plot_precision_recall_curve(sgd, X_test, y_test)

enter image description here

Given that the sgd classifier in scikit-learn uses loss="hinge" by default, how is it possible for this curve to be plotted? My understanding is that the output of the sgd is not probabilistic -- it's either 1/0. So there are no "thresholds", and yet the sklearn precision-recall curve plots a zigzagged graph with different kinds of thresholds. What's going on here?


Solution

  • The situation you describe is practically identical with one found in a documentation example, using the first 2 classes of the iris data and a LinearSVC classifier (the algorithm uses the squared hinge loss, which, like the hinge loss you use here, results in a classifier that produces only binary outcomes and not probabilistic ones). The resulting plot there is:

    enter image description here

    i.e. qualitatively similar to yours here.

    Nevertheless, your question is a legitimate one and a nice catch indeed; how comes and we get a behavior similar to one produced by probabilistic classifiers, when our classifier does not indeed produce probabilistic predictions (and hence any notion of a threshold sounds irrelevant)?

    To see why this is so, we need to do some digging into the scikit-learn source code, starting from the plot_precision_recall_curve function used here and following the thread down into the rabbit hole...

    Starting from the source code of plot_precision_recall_curve, we find:

    y_pred, pos_label = _get_response(
        X, estimator, response_method, pos_label=pos_label)
    

    So, for the purposes of plotting the PR curve, the predictions y_pred are not produced directly by the predict method of our classifier, but by the _get_response() internal function of scikit-learn.

    _get_response() in turn includes the lines:

    prediction_method = _check_classifier_response_method(
        estimator, response_method)
    
    y_pred = prediction_method(X)
    

    which finally leads us to the _check_classifier_response_method() internal function; you can check the full source code of it - what is of interest here are the following 3 lines after the else statement:

    predict_proba = getattr(estimator, 'predict_proba', None)
    decision_function = getattr(estimator, 'decision_function', None)
    prediction_method = predict_proba or decision_function
    

    By now, you may have started getting the point: under the hood, plot_precision_recall_curve checks if either a predict_proba() or a decision_function() method is available for the classifier used; and if a predict_proba() is not available, like your case here of an SGDClassifier with hinge loss (or the documentation example of a LinearSVC classifier with squared hinge loss), it reverts to the decision_function() method instead, in order to calculate the y_pred which will be subsequently used for plotting the PR (and ROC) curve.


    The above have arguably answered your programming question about how exactly scikit-learn produces the plot and the underlying calculations in such cases; further theoretical inquiries regarding if & why using the decision_function() of a non-probabilistic classifier is indeed a correct and legitimate approach to get a PR (or ROC) curve are out of scope for SO, and they should be addressed to Cross Validated, if necessary.