I am attempting to create a Cox proportional hazards model with only one explanatory variable. To perform a likelihood ratio test, I know I need a full and reduced model. I also know that a full model would be a separate mean for each group, and a reduced model would use an overall mean for the whole data set. How can I ensure I am setting this up properly in R? In this model z is 1 if the patient had heart surgery, and z is 0 otherwise
I have:
model<-coxph(Surv(time,delta)~z,method='breslow',data=heartdata)
X.lr <- 2*(model$loglik[2]-model$loglik[1])
Does this achieve that? I get an answer I just want to know whether this makes a full vs. reduced model since I don't have other variables to use?
In this case, this does work, but I think there's a better/more transparent solution using update()
and anova()
(I wasn't even aware that the log-likelihood component of coxph
models included both the full and the null deviances).
Using a built-in data set from the survival
package:
## drop NAs so we are using the same data set for full & reduced models
lungna <- na.omit(lung)
## fit full model
m1 <- coxph(Surv(time, status) ~ ph.ecog, data=lungna)
## update model to fit intercept only (` ~ 1 ` replaces the RHS of the formula):
## ~ 1 means "intercept only" in R formula notation
m0 <- update(m1, . ~ 1)
## anova() runs a likelihood-ratio test
anova(m0,m1)
Results:
Analysis of Deviance Table
Cox model: response is Surv(time, status)
Model 1: ~ 1
Model 2: ~ ph.ecog
loglik Chisq Df P(>|Chi|)
1 -508.12
2 -501.91 12.409 1 0.0004273 ***
You can confirm that 2*diff(m1$loglik)
gives 12.409, the same value reported by anova()
for the deviance ("Chisq") difference, and that pchisq(chisq_val, df = 1, lower.tail = FALSE)
gives the reported p-value.