I am currently working on a Kaplan-Meier-Plot in R using the survival
package and I am planning on creating and designing the plot using the survminer
package afterwards.
The data I have is from an excel table:
data <- read_excel("worksheet".xlsx", na = "NA", sheet = 1)
is what I used to import the data.
The structure of the data (I manipulated it a bit before) is as following:
gropd_df [37 × 5] (S3: grouped_df/tbl_df/tbl/data.frame)
$ n : num [1:37] 1 1 1 1 15 40 5 6 1 311 ...
$ Rec : num [1:37] 0 0 0 0 0 1 0 0 0 105 ...
$ nonRec : num [1:37] 1 1 1 1 15 39 5 6 1 206 ...
$ time : num [1:37] 3 6 6 6 7.5 8 8 8.5 9 9.9...
I now want to create a Survivalobject using Surv()
and then later the kaplan-meier estimate with survfit()
from the Survivalobject:
so <- Surv(data$time, data$Rec)
.
The variable n
stands for the number of people observed in a study, time
stands for the time after an operation (time = 0 on operations day) where the patients had a follow-up examination of their disease. If the disease did come back after the operation (determined in the follow-up examination), the patient was counted in Rec
, if the disease did not reoccur, he/she was counted as nonRec
.
As you can see, the events are not displayed as time
and status
with status being "0" or "1", but as time
and number of people being "0" (nonRec
) and number of people being "1" (Rec
). Therefore, creating the survivalobject above does (obviously) not work out how I want:
so <- Surv(data$time, data$Rec)
Warning message:
In Surv(data$time, data$Rec) : Invalid status value, converted to NA
Does anyone know a smart solution to solve this? Changing the data manually does not work out given the size of the data. Also, I am sorry if this question may seem dumb to you but I startet using R just a few weeks ago and I`m by far not a pro in using it..
Help would be very welcome, thanks for reading :)
Update following @jpsmith comment (hope this is right):
dput(FUP)
structure(list(n = c(1, 1, 1, 1, 15, 40, 5, 6, 1, 311, 20, 7,
5, 60, 25, 25, 62, 228, 1, 12, 29, 24, 86, 41, 200, 5, 3, 30,
3, 30, 237, 41, 162, 10, 14, 27, 1), Rec = c(0, 0, 0, 0, 0, 1,
0, 0, 0, 105, 1, 0, 2, 1, 0, 0, 6, 6, 0, 1, 9, 2, 15, 3, 22,
0, 0, 1, 0, 4, 23, 10, 0, 0, 4, 2, 0), nonRec = c(1, 1, 1, 1,
15, 39, 5, 6, 1, 206, 19, 7, 3, 59, 25, 25, 56, 222, 1, 11, 20,
22, 71, 38, 178, 5, 3, 29, 3, 26, 214, 31, 162, 10, 10, 25, 1
), fuptime = c(3, 6, 6, 6, 7.5, 8, 8, 8.5, 9, 9.9, 10, 10, 12,
12, 12, 12, 12, 12, 12, 12, 12.3, 13.9, 14, 15.2, 18, 21, 24.2,
25, 28, 30, 30, 35.3, 36, 36, 36, 48.6, 66), operation3 = c("12-laser",
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser",
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser",
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser",
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser",
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser",
"12-laser", "12-laser", "12-laser", "12-laser", "12-laser", "12-laser"
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-37L), groups = structure(list(fuptime = c(3, 6, 7.5, 8, 8.5,
9, 9.9, 10, 12, 12.3, 13.9, 14, 15.2, 18, 21, 24.2, 25, 28, 30,
35.3, 36, 48.6, 66), .rows = structure(list(1L, 2:4, 5L, 6:7,
8L, 9L, 10L, 11:12, 13:20, 21L, 22L, 23L, 24L, 25L, 26L,
27L, 28L, 29L, 30:31, 32L, 33:35, 36L, 37L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -23L), .drop = TRUE))
As I said in my comment, uncount()
is a good way to convert your data to one-observation-per-participant. For example:
library(tidyverse)
fup1 <- fup %>%
uncount(n) %>%
mutate(Status = ifelse(Rec == 0, 0, 1)) %>%
select(-Rec, -nonRec) %>%
ungroup()
fup1
# A tibble: 1,770 × 3
fuptime operation3 Status
<dbl> <chr> <dbl>
1 3 12-laser 0
2 6 12-laser 0
3 6 12-laser 0
4 6 12-laser 0
5 7.5 12-laser 0
6 7.5 12-laser 0
7 7.5 12-laser 0
8 7.5 12-laser 0
9 7.5 12-laser 0
10 7.5 12-laser 0
# … with 1,760 more rows
Note that I've converted your Rec
and nonRec
columns to a single indicator column, Status
.
Now it's easy to fit your Kaplan-Meier model:
library(survival)
survfit(Surv(fuptime, Status) ~ operation3, data=fup1)
Call: survfit(formula = Surv(fuptime, Status) ~ operation3, data = fup1)
n events median 0.95LCL 0.95UCL
[1,] 1770 1497 14 14 15.2