rsurvival-analysiscox-regressionsurvival

How to include a time-dependent coefficient for a categorical covariate in a Cox survival model in R?


I am building a Cox PH model using the survival package in R and would like to include a time-dependent coefficient for my categorical variable. Reproducible data set up:

library(survival)
# Data
stanford <- stanford2
stanford$age_cat <- ifelse(stanford$age > 35, "old", "young")

Working from the time-dependent vignette here for the survival package, I need to use the tt() function. Attempt 1 revealed I needed dummy coding.

mod.fail <- coxph(Surv(time, status) ~ tt(age_cat),
             data = stanford,
             tt = function(x, t, ...) x*t)
Error in x * t : non-numeric argument to binary operator

So, add this indicator variable.

# Create dummy coding of age_cat
stanford$age_cat_d <- ifelse(stanford$age_cat == "old", 1, 0)

Now, I am confused how to properly specify the model. Both of the below will run, but I am not sure which provides the right solution to letting the effect of the age category vary over time.

# Model 1
mod.t1 <- coxph(Surv(time, status) ~ tt(age_cat_d),
               data = stanford,
               tt = function(x, t, ...) x*t)
# Model 2
mod.t2 <- coxph(Surv(time, status) ~ age_cat_d + tt(age_cat_d),
                data = stanford,
                tt = function(x, t, ...) x*t)

Below is how I would think we should estimate the effect of the age category at time=200 in each model, showing the models are different.

# Model 1
coef(mod.t1)[1]*200
tt(age_cat_d) 
   0.04425679
# Model 2
coef(mod.t2)[1]+coef(mod.t2)[2]*200
age_cat_d 
0.5424105 

So, are either of the above models the correct way to implement a time-dependent coefficient for the age category? The examples in the linked vignette (and other guides for using tt() I've found) focus on time-dependent coefficients for continuous variables. (Note: The above example is just for reproducibility; I am not arguing we should create such a time-dependent model for the given data) [1]: https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf


Solution

  • As tt() declares the transformation for time-varying coefficients regardless of whether your covariate is continuous or discrete, this is a question of understanding the model you are fitting when you drop the "main term" from a time-varying coefficient Cox model and how to interpret the parameter estimates.

    The easiest way to answer this is probably to go through different model specifications (via syntax) and explain what they're doing.

    Setup

    library(survival)
    
    # Data
    stanford <- stanford2
    stanford$age_b <- ifelse(stanford$age > 35, 1, 0) # add binary covariate
    
    # Function computing time functional form
    myfun <- function(x, t, ...){ x * log(t + 20)}
    

    Models with a Continuous Covariate

    Model 1 Continuous: continuous covariate, time-constant coefficient

    Age has one time-invariant effect.

    coxph(Surv(time, status) ~ age, data = stanford)
    #> Call:
    #> coxph(formula = Surv(time, status) ~ age, data = stanford)
    #> 
    #>        coef exp(coef) se(coef)     z       p
    #> age 0.02917   1.02960  0.01064 2.741 0.00613
    #> 
    #> Likelihood ratio test=8.27  on 1 df, p=0.004034
    #> n= 184, number of events= 113
    

    Model 2 Continuous: continuous covariate, adding time-varying coefficient

    The effect of age now also varies across time. The total effect of age is decomposed into a time-invariant term (the coefficient on age) and a time-varying term (the coefficient on tt(age)). The total effect of age in this example is -.007 + .007*log(t+20) based on the function used for tt(). This interpretation is provided in the time-varying coefficient vignette.

    coxph(Surv(time, status) ~ age + tt(age),
                tt =  myfun, 
                data = stanford)
    #> Call:
    #> coxph(formula = Surv(time, status) ~ age + tt(age), data = stanford, 
    #>     tt = myfun)
    #> 
    #>              coef exp(coef)  se(coef)      z     p
    #> age     -0.007256  0.992770  0.042434 -0.171 0.864
    #> tt(age)  0.007182  1.007208  0.008190  0.877 0.381
    #> 
    #> Likelihood ratio test=9.04  on 2 df, p=0.01086
    #> n= 184, number of events= 113
    

    Model 3 Continuous: continuous covariate, remove base term

    Similar to Model 2, we're letting the effect of age vary with time. However, we no longer are separately estimating the time-varying component and the time-invariant component. Instead, we're directly estimating the total effect of age, which can vary across time. The total effect of age is .006*log(t+20).

    coxph(Surv(time, status) ~ tt(age),
          tt =  myfun, 
          data = stanford)
    #> Call:
    #> coxph(formula = Surv(time, status) ~ tt(age), data = stanford, 
    #>     tt = myfun)
    #> 
    #>             coef exp(coef) se(coef)     z       p
    #> tt(age) 0.005829  1.005846 0.002046 2.849 0.00439
    #> 
    #> Likelihood ratio test=9.02  on 1 df, p=0.002677
    #> n= 184, number of events= 113
    

    Models with a Binary Covariate

    Now let's try to fit these models with a binary covariate instead of a continuous one. The coefficient estimates change but they still represent the same concepts with respect to time.

    Model 1 Binary: binary covariate, time-constant coefficient

    Same as Model 1 Continuous: age has one time-invariant effect. Now instead of that effect being the effect of a 1-unit change in continuous age, it's the effect of being old rather than young.

    coxph(Surv(time, status) ~ age_b, data = stanford)
    #> Call:
    #> coxph(formula = Surv(time, status) ~ age_b, data = stanford)
    #> 
    #>         coef exp(coef) se(coef)     z     p
    #> age_b 0.2721    1.3128   0.2304 1.181 0.238
    #> 
    #> Likelihood ratio test=1.47  on 1 df, p=0.2258
    #> n= 184, number of events= 113
    
    ### Model 2 Binary: binary covariate, adding time-varying coefficient
    
    Same as Model 2 Continuous: the effect of age now also varies across time. The total effect of age is decomposed into a time-invariant term (the coefficient on age) and a time-varying term (the coefficient on tt(age)). The total effect of age in this example is .025 + .050*log(t+20) based on the function used for `tt()`. That is the effect of being old rather than young.
    
    ```r
    coxph(Surv(time, status) ~ age_b + tt(age_b),
                tt =  myfun, 
                data = stanford)
    #> Call:
    #> coxph(formula = Surv(time, status) ~ age_b + tt(age_b), data = stanford, 
    #>     tt = myfun)
    #> 
    #>              coef exp(coef) se(coef)     z     p
    #> age_b     0.02475   1.02506  0.92143 0.027 0.979
    #> tt(age_b) 0.04680   1.04791  0.16956 0.276 0.783
    #> 
    #> Likelihood ratio test=1.54  on 2 df, p=0.4621
    #> n= 184, number of events= 113
    

    Model 3 Binary: binary covariate, remove base term

    Once again, we are now estimating the total time-varying effect of being old vs. young rather than decomposing the total effect into time-varying and time-invariant components.

    coxph(Surv(time, status) ~ tt(age_b),
          tt =  myfun, 
          data = stanford)
    #> Call:
    #> coxph(formula = Surv(time, status) ~ tt(age_b), data = stanford, 
    #>     tt = myfun)
    #> 
    #>              coef exp(coef) se(coef)     z     p
    #> tt(age_b) 0.05121   1.05255  0.04239 1.208 0.227
    #> 
    #> Likelihood ratio test=1.54  on 1 df, p=0.2142
    #> n= 184, number of events= 113