Short introduction:
A Cox Proportional Hazards (PH) model can be estimated with the coxph
function of the survival
package. An obvious requirement to get sensible results from this type of model is that the hazards are proportional, i.e., they are constant over time. If this is not the case for a certain variable, it can be solved by making the coefficient of this variable time varying. (Now it is technically an extended Cox model.) This is done by adding the tt()
to that variable and specifying a function over time (see vignette("timedep", package = "survival")
page 19+).
Question:
Which function is used if tt()
is used without specifying a function?
Here is an example:
library(survival)
data(lung)
cox_model <- coxph(Surv(time, status) ~ age + sex + ph.karno, data = lung)
cox_model_ph <- cox.zph(cox_model)
# rho chisq p
# age 0.00701 0.00871 0.92566
# sex 0.12249 2.42336 0.11954
# ph.karno 0.23135 8.24167 0.00409
# GLOBAL NA 11.54750 0.00911
We see that ph.karno
violates the PH assumption (small p-value), so add tt()
:
cox_model_tt <- coxph(Surv(time, status) ~ age + sex + tt(ph.karno), data = lung)
cox_model_tt_ph <- cox.zph(cox_model_tt)
# rho chisq p
# age -0.00907 0.0142 0.9052
# sex 0.12844 2.7270 0.0987
# tt(ph.karno) 0.11643 2.3846 0.1225
# GLOBAL NA 5.0220 0.1702
Now the PH assumption is satisfied, but I have no idea what the tt()
function actually did. I tried some common used function such as tt = function(x, t, ...) x*t
, tt = function(x, t, ...) x + t
, tt = function(x, t, ...) x*log(t)
. But all gave different results (and were unable to fix the PH violation).
Any help is appreciated.
Looking through the code for coxph
I think if I have found it. You offered no value for the 'tt'-parameter, so I think this gets executed:
if (is.null(tt)) {
tt <- function(x, time, riskset, weights) {
obrien <- function(x) {
r <- rank(x)
(r - 0.5)/(0.5 + length(r) - r)
}
unlist(tapply(x, riskset, obrien))
}
And here's an experimental confirmation:
> cox_model_OB <- coxph(Surv(time, status) ~ age + sex + tt(ph.karno), data = lung, tt= function(x, time, riskset, weights) {
+ obrien <- function(x) {
+ r <- rank(x)
+ (r - 0.5)/(0.5 + length(r) - r)
+ }
+ unlist(tapply(x, riskset, obrien))
+ }
+ )
> ( cox_model_tt_ph <- cox.zph(cox_model_tt) )
rho chisq p
age -0.00907 0.0142 0.9052
sex 0.12844 2.7270 0.0987
tt(ph.karno) 0.11643 2.3846 0.1225
GLOBAL NA 5.0220 0.1702
I'm wondering if this was intentional. I suspect it is code left in during a development session. I suspect that Therneau intends that failure to offer a 'tt'-function should throw at least a warning, but probably would have preferred an error.
So that was a guess and I found that I was wrong by searching through the vignettes and find that it is intended: "This relies on the fact that the input arguments to tt() are ordered by the event number or risk set. This function is used as a default if no tt argument is present in the coxph call, but there are tt terms in the model formula. (Doing so allowed me to depreciate the survobrien function)." ref: page 23 of "Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model"from the current survival package Index help page links to Vignettes.