rglmconfidence-intervalbinomial-coefficients

How to get “95%CI for a Binomial Proportion” using glm in R


I refer to the site Interval Estimation for a Binomial Proportion Using glm in R, getting the ”asymptotic” 95%CI. I think binomial link doesn’t have “identity”, but the following program got the answer. Why? If you know that, please give me some advice.

This example, n=10, x=2, asymptiotic 95%CI

binom.ci<-function(x,n,alpha = 0.05,link="logit"){
z<-qnorm(1-alpha/2)
family<-binomial(link=link)
fit<-summary(glm(cbind(x,n-x)~1,family=family))
est<-fit$coef[,"Estimate"]
se<-fit$coef[,"Std. Error"]
family$linkinv(est+c(-z,z)*se)
}

binom.ci(x=2,n=10,link="identity")
# [1] -0.04791801  0.44791801

Solution

  • It is possible to have an "identity" link function with binomial, but it doesn't make much sense here. We can see what's going on inside the function if we replicate the glm call with the arguments in your example.

    fit <- summary(glm(cbind(2, 8) ~ 1,family = binomial("identity")))
    fit
    #> 
    #> Call:
    #> glm(formula = cbind(2, 8) ~ 1, family = binomial("identity"))
    #> 
    #> Deviance Residuals: 
    #> [1]  0
    #> 
    #> Coefficients:
    #>             Estimate Std. Error z value Pr(>|z|)
    #> (Intercept)   0.2000     0.1265   1.581    0.114
    #> 
    #> (Dispersion parameter for binomial family taken to be 1)
    #> 
    #>     Null deviance: 0  on 0  degrees of freedom
    #> Residual deviance: 0  on 0  degrees of freedom
    #> AIC: 4.3947
    #> 
    #> Number of Fisher Scoring iterations: 2
    

    The "identity" link means that the intercept is to be interpreted directly as a probability, but the problem is that taken literally, that means our 95% confidence interval includes a negative probability, which doesn't make any sense. We really need to use a logit link here to get the correct answer.

    So yes, we can use binomial(link = "identity") here without it throwing an error, but it does not give us the classical 95% confidence interval for a binomial proportion.