The following data comes with a competing risk scenario.
I want to plot the cumulative incidence and add a risk table for both groups using the base R function plot()
library(mstate)
data(aidssi)
si <- aidssi
ci <- Cuminc(si$time, si$status, group = si$ccr5)
ci.WW <- ci[ci$group == "WW", ]
ci.WM <- ci[ci$group == "WM", ]
plot(c(0, ci.WW$time), c(0, ci.WW$CI.1),
type = "n",
xlim = c(0, 14),
ylim = c(0, 0.5),
xlab = "Time",
ylab = "Cumulative incidence"
)
lines(c(0, ci.WW$time), c(0, ci.WW$CI.1), type = "s", col = "black", lty = 1, lwd = 1)
lines(c(0, ci.WM$time), c(0, ci.WM$CI.1), type = "s", col = "grey", lty = 1, lwd = 1)
I have tried to create the risk table manually as follows. This works, but the number at risk per group is artificial and does not correspond to the actual data.
risk_WW <- c(100, 80, 60, 40, 20, 10, 5, 1)
risk_WM <- c(100, 75, 50, 30, 15, 8, 3, 2)
time_points <- c(0, 2, 4, 6, 8, 10, 12, 14)
par(mar = c(4, 5, 3, 3))
par(oma = c(5, 0, 0, 0))
mtext("WW", adj = 0, side = 1, line = 4, at = -2, cex = 1, col = "black")
mtext(risk_WW, side = 1, line = 4, at = time_points, cex = 1, col = "black")
mtext("WM", adj = 0, side = 1, line = 6, at = -2, cex = 1, col = "grey")
mtext(risk_WM, side = 1, line = 6, at = time_points, cex = 1, col = "black")
Is there a way to determine the number at risk per group, preferably without recording them manually?
I am aware of using {ggsurvfit} but want to give the base plot a shot.
The number of people at risk at time t is the number of people without an event at time t and not censored before. Thus it is the number of people with a time of event/censorship at time t or after. We can retrieve this with data as follows :
NRisk <- by(si$time, si$ccr5, \(time) {
do.call("c", lapply(time_points, \(tp) sum(time >= tp)))
})
The by function will apply the function (time) {...} to si$time in each group of si$ccr5. Then lapply will sum for each time_points the number of values of time (si$time in the group of ccr5) that are greater or equal to that time_point. do.call("c", ...) will transform the list given by lapply to a vector.
The numbers at risk in each group is then :
NRisk[["WW"]]
[1] 259 237 187 123 83 51 29 0
NRisk[["WM"]]
[1] 65 61 52 46 39 30 21 0
As an alternative, we can use the survfit function from package survival also :
Km <- survfit(Surv(time, I(status != 0)) ~ ccr5, data = si)
summary(Km, times = time_points, extend = TRUE)
This gives the same result on my computer.
If I made a mistake don't hesitate to tell me where I didn't understand.