rtruncationsurvivalggfortify

2 survival functions, 1 left-truncated, 1 not truncated. How create survival function in R that assumes same experience over the truncated interval?


I have two survival functions, one is not truncated so I have experience for all time periods. The other is left-truncated until t = 4, so it has no experience until t > 4. I can plot the two together in the following code in R using the survival package.

library(tidyverse)
library(survival)
library(ggfortify)

# create two survival functions
set1 <- tibble(start0 = rep(0,10), end0 = 1:10, event0 = rep(1,10))
set2 <- tibble(start0 = rep(4,10), end0 = c(5, 5, 7, 9, rep(10, 6)), event0 = rep(1,10))
combined_set <- bind_rows(set1, set2)
survival_fn <- survfit(Surv(start0, end0, event0) ~ start0, data = combined_set)

# plot the survival function:
autoplot(survival_fn, conf.int = FALSE)

I would like to show the difference in survival between the two functions if they had both experienced the same survival experience during the truncation period - i.e. up to t = 4. I've manually sketched the approximate graph I am trying to achieve (size of steps not to scale).

This is a simplified example - in practice I have eight different sets of data with different truncation periods, and around 2000 data-points in each set.

desired output


Solution

  • If you look at the structure of the survival_fn object (which is not a function but rather a list), you see:

    str(survival_fn)
    List of 17
     $ n        : int [1:2] 10 10
     $ time     : num [1:14] 1 2 3 4 5 6 7 8 9 10 ...
     $ n.risk   : num [1:14] 10 9 8 7 6 5 4 3 2 1 ...
     $ n.event  : num [1:14] 1 1 1 1 1 1 1 1 1 1 ...
     $ n.censor : num [1:14] 0 0 0 0 0 0 0 0 0 0 ...
     $ surv     : num [1:14] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ...
     $ std.err  : num [1:14] 0.105 0.158 0.207 0.258 0.316 ...
     $ cumhaz   : num [1:14] 0.1 0.211 0.336 0.479 0.646 ...
     $ std.chaz : num [1:14] 0.1 0.149 0.195 0.242 0.294 ...
     $ strata   : Named int [1:2] 10 4
      ..- attr(*, "names")= chr [1:2] "start0=0" "start0=4"
     $ type     : chr "counting"
     $ logse    : logi TRUE
     $ conf.int : num 0.95
     $ conf.type: chr "log"
     $ lower    : num [1:14] 0.732 0.587 0.467 0.362 0.269 ...
     $ upper    : num [1:14] 1 1 1 0.995 0.929 ...
     $ call     : language survfit(formula = Surv(start0, end0, event0) ~ start0, data = combined_set)
     - attr(*, "class")= chr "survfit"
    

    So one way of getting something like you goal although still with an automatic start to the survival function at (t=0,S=1) would be to multiply all the $surv items in the 'start0=4'-stratum by the surv value at t=4, and then redo the plot:

    survival_fn[['surv']][11:14] <- survival_fn[['surv']][11:14]*survival_fn[['surv']][4]
    

    enter image description here

    I can see why this might not be a totally conforming answer since there is still a blue line from 1 out to t=5 and it doesn't actually start at the surv value for stratum 1 at t=4. That is however a limitation of using a "high-level" abstraction plotting paradigm. The customizability is inhibited by the many "helpful" assumptions built into the plotting grammar. It would not be as difficult to do this in base plotting since you could "move things around" without as many constraints.

    If you do need to build a step unction from estimated survial proportions and times you might look at this answer and then build an augmented dataset with a y at time=4 adjustment for the later stratum. You would need to add a time=0 value of for the main stratum and a time=4 value of the first stratum for the second stratum as well as dong the adjustment as shown above. See this question and answer. Reconstruct survival curve from coordinates