machine-learningscikit-learnlogistic-regressionlog-likelihood

How decision function of Logistic Regression on scikit-learn works?


I am trying to understand how this function works and the mathematics behind it. Does decision_function() in scikitlearn give us log odds? The function return values ranging from minus infinity to infinity and it seems like 0 is the threshold for prediction when we are using decision_function() whereas the threshold is 0.5 when we are using predict_proba(). This is exactly the relationship between probability and log odds Geeksforgeeks.

I couldn't see anything about that in the documentation but the function behaves like log-likelihood I think. am I right?


Solution

  • Decision function is nothing but the value of (as you can see in the source)

    f(x) = <w, x> + b
    

    where predict proba is (as you can see in the source)

    p(x) = exp(f(x)) / [exp(f(x)) + exp(-f(x))] = 1 / (1 + exp(-2x))
    

    which up to a constant under exp, is just a regular sigmoid function.

    Consequently, the corresponding threshold points will be 0 for f(x), and 0.5 for p(x), since

    exp(0) / [exp(0) + exp(-0)] = 1 / 2 = 0.5
    

    So how do you interpret the decision function? It is essentially 2 times the logit of the probability modeled by LR model. (The "2 times" comes from just a trick scikit-learn uses to always be able to use softmax instead of manually doing sigmoid, which is unfortunate).