I've data with binary outcome and binary dependent variables. The problem is the zero events presence for almost one variable.
I have to do binary logistic regression with OR (95% CI) estimate and I can't due to zero events.
I've tried exact-like inference in logistic regression models (elrm
R package) with no result.
how can I solve it?
here's the code for the elrm:
> x <- xtabs(~dead + interaction(volo_1, NACA_cat), data = db)
> x
interaction(volo_1, NACA_cat)
morto 0.0 1.0 0.1 1.1
0 2340 273 303 81
1 0 0 86 21
> cdat <- cdat <- data.frame(volo_1 = rep(0:1, 2), NACA_cat = rep(0:1, each = 2), admit = x[2, ], ntrials = colSums(x))
> cdat
volo_1 NACA_cat admit ntrials
0.0 0 0 0 2340
1.0 1 0 0 273
0.1 0 1 86 389
1.1 1 1 21 102
> m.volo_1 <- elrm(formula = admit/ntrials ~ volo_1, interest = ~volo_1, iter = 22000, dataset = cdat, burnIn = 2Progress: 100%
Generation of the Markov Chain required 3 secs
Conducting inference ...
Inference required 0 secs
> ## summary of model including estimates and CIs
> summary(m.volo_1)
Call:
elrm(formula = admit/ntrials ~ volo_1, interest = ~volo_1, iter = 22000, dataset = cdat, burnIn = 2000)
Results:
estimate p-value p-value_se mc_size
volo_1 0.63666 0.02295 0.00166 20000
95% Confidence Intervals for Parameters
lower upper
volo_1 0.06414017 1.352148
>
>
>
> m.NACA_cat <- elrm(formula = admit/ntrials ~ NACA_cat, interest = ~NACA_cat, iter = 22000, dataset = cdat, burnIn = 2Progress: 100%
Generation of the Markov Chain required 4 secs
Conducting inference ...
Inference required 0 secs
Warning message:
'NACA_cat' observed value of the sufficient statistic was not sampled
> ## summary of model including estimates and CIs
> summary(m.NACA_cat)
Call:
elrm(formula = admit/ntrials ~ NACA_cat, interest = ~NACA_cat, iter = 22000, dataset = cdat, burnIn = 2000)
Results:
estimate p-value p-value_se mc_size
NACA_cat NA 0 0 20000
95% Confidence Intervals for Parameters
lower upper
NACA_cat NA NA
here's the code for the binary logistic regression:
> full.model <- glm(morto ~ volo_1 + NACA_cat , data = db,family=binomial())
> logistic.display(full.model)
Logistic regression predicting morto : 1 vs 0
crude OR(95%CI) adj. OR(95%CI) P(Wald's test) P(LR-test)
volo_1: 1 vs 0 1.82 (1.12,2.98) 0.91 (0.53,1.56) 0.741 0.739
NACA_cat: 1 vs 0 647257526.44 (0,Inf) 653272824.33 (0,Inf) 0.972 < 0.001
Log-likelihood = -257.3593
No. of observations = 3104
AIC value = 520.7187
EDIT: Here is a MRE with fewer records but the same problem:
db<- data.frame( morto = c(0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
volo_1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0),
NACA_dic = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
db$volo_1<- as.factor(db$volo_1)
db$morto<- as.factor(db$morto)
db$NACA_dic<- as.factor(db$NACA_dic)
full.model <- glm(morto ~ volo_1 + NACA_dic , data = db,family=binomial())
logistic.display(full.model)
x <- xtabs(~morto + interaction(volo_1, NACA_dic), data = db)
cdat <- cdat <- data.frame(volo_1 = rep(0:1, 2), NACA_dic = rep(0:1, each = 2),
admit = x[2, ], ntrials = colSums(x))
cdat
m.volo_1 <- elrm(formula = admit/ntrials ~ volo_1, interest = ~volo_1, iter = 22000,
dataset = cdat, burnIn = 2000)
summary(m.volo_1)
m.NACA_dic <- elrm(formula = admit/ntrials ~ NACA_dic, interest = ~NACA_dic, iter = 22000,
dataset = cdat, burnIn = 2000)
summary(m.NACA_cat)
Here is the problem:
> m.NACA_dic <- elrm(formula = admit/ntrials ~ NACA_dic, interest = ~NACA_dic, iter = 22000,
+ dataset = cdat, burnIn = 2Progress: 100%
Generation of the Markov Chain required 3 secs
Conducting inference ...
Inference required 0 secs
Warning message:
'NACA_dic' observed value of the sufficient statistic was not sampled
> summary(m.NACA_cat)
Call:
elrm(formula = admit/ntrials ~ NACA_cat, interest = ~NACA_cat, iter = 22000, dataset = cdat, burnIn = 2000)
Results:
estimate p-value p-value_se mc_size
NACA_cat NA 0 0 20000
95% Confidence Intervals for Parameters
lower upper
NACA_cat NA NA
SOLVED:
following the suggestions of @BenBolker I implemented Bias Reduction in Binomial-Response Generalized Linear Models with the brglm
package
> model<-brglm(morto ~ volo_1 + NACA_dic , data = db,family=binomial, model = TRUE, method = "brglm.fit",
+ pl = FALSE, x = FALSE, y = TRUE, contrasts = NULL)
>
> summary(model)
Call:
brglm(formula = morto ~ volo_1 + NACA_dic, family = binomial,
data = db, model = TRUE, method = "brglm.fit", pl = FALSE,
x = FALSE, y = TRUE, contrasts = NULL)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.5591 1.4144 -3.223 0.00127 **
volo_11 -1.3200 0.7845 -1.683 0.09244 .
NACA_dic1 4.6583 1.4325 3.252 0.00115 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 138.942 on 133 degrees of freedom
Residual deviance: 92.711 on 131 degrees of freedom
Penalized deviance: 90.11255
AIC: 98.711
> exp(coef(model))
(Intercept) volo_11 NACA_dic1
0.01047146 0.26712352 105.45428986
> exp(confint(model))
Profiling the ordinary deviance for the corresponding ML fit...
Profiling the penalized deviance for the supplied fit...
Calculating confidence intervals for the ML fit using deviance profiles...
Calculating confidence intervals for the BR fit using penalized likelihood profiles...
2.5 % 97.5 %
(Intercept) 0.00000000 0.07274017
volo_11 0.03224977 1.06917625
NACA_dic1 14.10469036 Inf
> coef(summary(model))[,'Pr(>|z|)']
(Intercept) volo_11 NACA_dic1
0.001266712 0.092437418 0.001146851