rlogistic-regressionspsslinear-discriminant

Possible to force logistic regression or other classifier through specific probability?


I have a data set with a binary variable[Yes/No] and a continuous variable (X). I'm trying to make a model to classify [Yes/No] X.

From my data set, when X = 0.5, 48% of the observations are Yes. However, I know the true probability for Yes should be 50% when X = 0.5. When I create a model using logistic regression X = 0.5 != P[Yes=0.5].

How can I correct this? I guess all probabilities should be slightly underestimated if it does not pass true the correct point.

Is it correct just to add a bunch of observations in my sample to adjust the proportion?

Does not have to be just logistic regression, LDA, QDA etc is also of interest.

I have searched Stack Overflow, but only found topics regarding linear regression.


Solution

  • I believe that in R (assuming you're using glm from base R) you just need

    glm(y~I(x-0.5)-1,data=your_data,family=binomial)
    

    the I(x-0.5) recenters the covariate at 0.5, the -1 suppresses the intercept (intercept = 0 at x=0.5 -> probability = 0.5 at x=0.5).

    For example:

    set.seed(101)
    dd <- data.frame(x=runif(100,0.5,1),y=rbinom(100,size=1,prob=0.7))
    m1 <- glm(y~I(x-0.5)-1,data=dd,family=binomial)
    predict(m1,type="response",newdata=data.frame(x=0.5)) ## 0.5