I'm having some trouble using coxph(). I've two categorical variables: Sex and Probable Cause, that I want to use as predictor variables. Sex is just the typical male/female but Probable Cause has 5 options. I don't know what is the problem with the warning message. Why does the cofidence intervals are from 0 to Inf and the p-values so high?
Here's the code and the output:
> my_coxph <- coxph(Surv(tempo,status) ~ factor(Sexo)+ factor(Causa.provavel) , data=ceabn)
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights, :
Loglik converged before variable 2,3,5,6 ; beta may be infinite.
> summary(my_coxph)
Call:
coxph(formula = Surv(tempo, status) ~ factor(Sexo) + factor(Causa.provavel),
data = ceabn)
n= 43, number of events= 31
coef exp(coef) se(coef) z Pr(>|z|)
factor(Sexo)macho 7.254e-01 2.066e+00 4.873e-01 1.488 0.137
factor(Causa.provavel)caca 2.186e+01 3.107e+09 9.698e+03 0.002 0.998
factor(Causa.provavel)colisao linha MT 1.973e+01 3.703e+08 9.698e+03 0.002 0.998
factor(Causa.provavel)indeterminado 9.407e-01 2.562e+00 1.683e+04 0.000 1.000
factor(Causa.provavel)predacao 2.170e+01 2.655e+09 9.698e+03 0.002 0.998
factor(Causa.provavel)predado 2.276e+01 7.659e+09 9.698e+03 0.002 0.998
exp(coef) exp(-coef) lower .95 upper .95
factor(Sexo)macho 2.065e+00 4.841e-01 0.7947 5.368
factor(Causa.provavel)caca 3.107e+09 3.219e-10 0.0000 Inf
factor(Causa.provavel)colisao linha MT 3.703e+08 2.701e-09 0.0000 Inf
factor(Causa.provavel)indeterminado 2.562e+00 3.904e-01 0.0000 Inf
factor(Causa.provavel)predacao 2.655e+09 3.766e-10 0.0000 Inf
factor(Causa.provavel)predado 7.659e+09 1.306e-10 0.0000 Inf
Concordance= 0.752 (se = 0.059 )
Rsquare= 0.608 (max possible= 0.987 )
Likelihood ratio test= 40.23 on 6 df, p=4.105e-07
Wald test = 7.46 on 6 df, p=0.2807
Score (logrank) test = 30.48 on 6 df, p=3.183e-05
Thank you
When I asked Terry Therneau (author of pkg:survival) about that several years ago he said the test that is being triggered to generate that warning is overly sensitive. Generally the warning is not correct. You can usually just look at your coefficients to see that they are not infinite or even effectively infinite.
In your case, however, it seems to be correctly warning you that there may be problems with your data or that model applied to your data, since you have implausibly large coefficients. A beta coefficient of 2.276e+01 (= 22.7) in an exponential model is just ridiculously high. (And you have 4 such coefficients.) The estimated relative risk is well over a million! You should be looking at tabular classifications of your data for problems of complete separation. Did any of your control group die, er, have an event?
Such questions are best addressed with tabulations:
table(outcome, treatment_variable, selected_categorical_covariates)