rplotsurvival-analysis

swimmer survival plot


Is there an easy way to generate a swimmer plot in R? Same data as in a KM curve but with each individual survival represented as a line. Example:

I've searched stackoverflow, the R-help mailing list, and consulted Dr. Google without an obvious answer, though my search technique may be suboptimal. Thank you!

**** ADDENDED **** Apologies for not appropriately asking a question - this is my first time! Playing around, I've been able to do the following:

          OS DeathYN TreatmentGroup
4   444 days       1              0
5   553 days       1              0
8   812 days       0              0
1   844 days       0              0
10 1071 days       0              0
9  1147 days       0              0
6  1349 days       0              0
3  1375 days       0              0
2  1384 days       0              1
7  1687 days       0              0

orderedData$GroupColor[orderedData$TreatmentGroup==0] <- "yellow"
orderedData$GroupColor[orderedData$TreatmentGroup==1] <- "red"
orderedData$YCoord <- barplot(as.numeric(orderedData$OS), horiz=TRUE,  col=orderedData$GroupColor, xlim=c(0,max(orderedData$OS) + 50), xlab="Overall Survival")
points(x=20+as.numeric(orderedData$OS), y=orderedData$YCoord,pch=62, col="green")
legend(1000,2, c("Control", "Treatment", "still living"), col=c("yellow","red", "green"), lty=1, lwd=c(10,10,0),pch=62)

This gets me close enough for now, but aesthetics are not perfect. If there is a package or a better solution someone can suggest I'd love to see it!


Solution

  • You asked for an "easy" way to generate a swimmer plot. This is probably a bit more involved than you were hoping for, but it's pretty close to what you posted. If you need to make a lot of swimmer plots, you can tweak this into something that works for you and then turn it into a function.

    First create some fake data:

    library(ggplot2)
    library(reshape2)
    library(dplyr)
    library(grid)
    
    set.seed(33)
    dat = data.frame(Subject = 1:10, 
                     Months = sample(4:20, 10, replace=TRUE),
                     Treated=sample(0:1, 10, replace=TRUE),
                     Stage = sample(1:4, 10, replace=TRUE),
                     Continued=sample(0:1, 10, replace=TRUE))
    
    dat = dat %>%
      group_by(Subject) %>%
      mutate(Complete=sample(c(4:(max(Months)-1),NA), 1, 
                             prob=c(rep(1, length(4:(max(Months)-1))),5), replace=TRUE),
             Partial=sample(c(4:(max(Months)-1),NA), 1, 
                            prob=c(rep(1, length(4:(max(Months)-1))),5), replace=TRUE),
             Durable=sample(c(-0.5,NA), 1, replace=TRUE))
    
    # Order Subjects by Months
    dat$Subject = factor(dat$Subject, levels=dat$Subject[order(dat$Months)])
    
    # Melt part of data frame for adding points to bars
    dat.m = melt(dat %>% select(Subject, Months, Complete, Partial, Durable),
                 id.var=c("Subject","Months"))
    

    Now for the plot:

    ggplot(dat, aes(Subject, Months)) +
      geom_bar(stat="identity", aes(fill=factor(Stage)), width=0.7) +
      geom_point(data=dat.m, 
                 aes(Subject, value, colour=variable, shape=variable), size=4) +
      geom_segment(data=dat %>% filter(Continued==1), 
                 aes(x=Subject, xend=Subject, y=Months + 0.1, yend=Months + 1), 
                 pch=15, size=0.8, arrow=arrow(type="closed", length=unit(0.1,"in"))) +
      coord_flip() +
      scale_fill_manual(values=hcl(seq(15,375,length.out=5)[1:4],100,70)) +
      scale_colour_manual(values=c(hcl(seq(15,375,length.out=3)[1:2],100,40),"black")) +
      scale_y_continuous(limits=c(-1,20), breaks=0:20) +
      labs(fill="Disease Stage", colour="", shape="", 
           x="Subject Recevied Study Drug") +
      theme_bw() +
      theme(panel.grid.minor=element_blank(),
            panel.grid.major=element_blank(),
            axis.text.y=element_blank(),
            axis.ticks.y=element_blank())
    

    enter image description here