rsurvival-analysissurvival

Plot number at risk using mstate::Cuminc()


The following data comes with a competing risk scenario.
I want to plot the cumulative incidence and add a risk table for both groups using the base R function plot()

library(mstate)
data(aidssi)

si <- aidssi
ci <- Cuminc(si$time, si$status, group = si$ccr5)

ci.WW <- ci[ci$group == "WW", ]
ci.WM <- ci[ci$group == "WM", ]

plot(c(0, ci.WW$time), c(0, ci.WW$CI.1),
  type = "n",
  xlim = c(0, 14),
  ylim = c(0, 0.5),
  xlab = "Time",
  ylab = "Cumulative incidence"
)
lines(c(0, ci.WW$time), c(0, ci.WW$CI.1), type = "s", col = "black", lty = 1, lwd = 1)
lines(c(0, ci.WM$time), c(0, ci.WM$CI.1), type = "s", col = "grey", lty = 1, lwd = 1)

I have tried to create the risk table manually as follows. This works, but the number at risk per group is artificial and does not correspond to the actual data.


risk_WW <- c(100, 80, 60, 40, 20, 10, 5, 1)
risk_WM <- c(100, 75, 50, 30, 15, 8, 3, 2) 
time_points <- c(0, 2, 4, 6, 8, 10, 12, 14) 

par(mar = c(4, 5, 3, 3)) 
par(oma = c(5, 0, 0, 0)) 

mtext("WW", adj = 0, side = 1, line = 4, at = -2, cex = 1, col = "black") 
mtext(risk_WW, side = 1, line = 4, at = time_points, cex = 1, col = "black")  

mtext("WM", adj = 0, side = 1, line = 6, at = -2, cex = 1, col = "grey") 
mtext(risk_WM, side = 1, line = 6, at = time_points, cex = 1, col = "black") 

Is there a way to determine the number at risk per group, preferably without recording them manually?
I am aware of using {ggsurvfit} but want to give the base plot a shot.


Solution

  • The number of people at risk at time t is the number of people without an event at time t and not censored before. Thus it is the number of people with a time of event/censorship at time t or after. We can retrieve this with data as follows :

    NRisk <- by(si$time, si$ccr5, \(time) {
       do.call("c", lapply(time_points, \(tp) sum(time >= tp)))
    })
    

    The by function will apply the function (time) {...} to si$time in each group of si$ccr5. Then lapply will sum for each time_points the number of values of time (si$time in the group of ccr5) that are greater or equal to that time_point. do.call("c", ...) will transform the list given by lapply to a vector.

    The numbers at risk in each group is then :

    NRisk[["WW"]]
    [1] 259 237 187 123  83  51  29   0
    NRisk[["WM"]]
    [1] 65 61 52 46 39 30 21  0
    

    As an alternative, we can use the survfit function from package survival also :

    Km <- survfit(Surv(time, I(status != 0)) ~ ccr5, data = si)
    summary(Km, times = time_points, extend = TRUE)
    

    This gives the same result on my computer.

    If I made a mistake don't hesitate to tell me where I didn't understand.