rggplot2survivalsurvminer

How to get survival table in R?


This question concerns survival data i.e. a triple that is (enter, exit, event). My data is structured such that for each individual id, I can have event = 1 only once. I want to generate a survival table which shows for any given age, how many individuals have experienced the event.

To get an idea of what my data is like, the survival curve looks like this:

df <- structure(list(id = c("23KU", "24N7", "277J", "29Q9", "2AWB", 
"2MVW", "2RLV", "2U7E", "2WQP", "2WUW"), enter = c(0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L), exit = c(25.2676249144422, 13, 22.1409993155373, 
46.4695414099932, 31.2772073921971, 5, 29.2320328542094, 19.8329911019849, 
31.8986995208761, 31.9534565366188), event = c(0L, 1L, 0L, 0L, 
0L, 1L, 0L, 0L, 0L, 0L)), row.names = c(NA, 10L), class = "data.frame")

fit <- survfit(Surv(enter, exit, event) ~ 1, df)

ggsurvplot(fit)

Survival Curve

Clearly, not every individual experiences the risk. But when I generate the risk table, the number at risk goes to 0, which I cannot fathom.

> risk_df <- ggsurvtable(fit, data = df)

> risk_df[["risk.table"]][["data"]]

   time n.risk n.event
1     0     10       0  
2     5     10       1
3    10      9       0
4    15      8       1
5    20      7       0
6    25      6       0
7    30      4       0
8    35      1       0
9    40      1       0
10   45      1       0

The column n.event clearly show the event only occurred twice, so the number at risk should be 8 till the very end. Any explanation would be appreciated, and any suggestion to get the proper risk table. I don't want to plot it, I wish to just export the table as a dataframe to convert to latex.


Solution

  • The number at risk goes to 0 because of censoring.

    You are right in that the n.event columns shows that the event only occurs twice (at time = 5 and at time = 15). If you look at the survival curve, survival drops at time = 5 and time = 15, indicating that your subjects experienced the event of interest.

    However, from time = 20 onwards, you have a series of + signs on the survival curve. This indicates that these subjects have been censored. If this is patient data, it could be because patients dropped out of a study/ moved to a different state and as such are no longer enrolled in your study. As you do not have any further information about these patients, you are unable to tell their survival beyond their point of dropping out. For example, you can't tell if the patient that dropped out at time = 20 will continue living for 21 days, 200 days, or 2000 days since they are lost to follow up.

    Censoring removes the censored patient from the number at risk (n.risk), but does not count it as a death. In mathematical terms, at time = 20, 7 patients are in the study pool and can experience the event of interest. One patient dropped out at time = 25, leaving 6 patients to potentially experience the event of interest. This continues until the last patient is lost to follow-up at time = 45.

    The risk table is correct - there are no issues with it.