I've just started using the survival
and survminer
packages in R and am trying to understand its output. In the code below I create a dataframe with the first 12 rows of my actual dataset, as representative of the issue/question. In this representative data:
Running length(which(testDF$status == 2))/nrow(testDF)
shows a death rate of 66.67% with this data, but the survival probability curves shown in the image below end at 0%. Should they not be ending at 66.67% at least for the average of all the data? What am I doing wrong here or am I misinterpreting survival probability?
Code:
library(ggplot2)
library(survival)
library(survminer)
testDF <- data.frame(
ID = 1:12,
time = c(0,34,0,12,12,21,0,0,39,11,13,26),
status = c(1,2,1,2,2,2,1,1,2,2,2,2),
node = c("C","C","B","A","C","C","B","C","B","C","A","B")
)
fit <- survfit(Surv(time, status) ~ node, data = testDF)
ggsurvplot(fit,
pval = TRUE,
conf.int = TRUE,
linetype = "strata",
surv.median.line = "hv",
ggtheme = theme_bw()
)
# percentage of deaths
length(which(testDF$status == 2))/nrow(testDF)
My coding of censored (no death, the survivors) observations in the "time" column with 0's was incorrect as Edward points out in his comments. Now I recode those survivor observations with the time length of the study of 40. I also re-run the plot without confidence intervals for solution clarity.
testDF <- data.frame(
ID = 1:12,
time = c(40,34,40,12,12,21,40,40,39,11,13,26), # 40 month study window (0's for no death changed to 40)
status = c(0,1,0,1,1,1,0,0,1,1,1,1), # 0 = censored, 1 = death
node = c("C","C","B","A","C","C","B","C","B","C","A","B")
)
# survival rates: total = 33.3%, node A = 0%, node B = 50%, node C = 33.3%
length(which(testDF$status == 0))/nrow(testDF)
length(which(testDF$status == 0 & testDF$node == "A"))/length(which(testDF$node == "A"))
length(which(testDF$status == 0 & testDF$node == "B"))/length(which(testDF$node == "B"))
length(which(testDF$status == 0 & testDF$node == "C"))/length(which(testDF$node == "C"))
Plot running the above revised DF: