Is there an easy way to generate a swimmer plot in R? Same data as in a KM curve but with each individual survival represented as a line. Example:
I've searched stackoverflow, the R-help mailing list, and consulted Dr. Google without an obvious answer, though my search technique may be suboptimal. Thank you!
**** ADDENDED **** Apologies for not appropriately asking a question - this is my first time! Playing around, I've been able to do the following:
OS DeathYN TreatmentGroup
4 444 days 1 0
5 553 days 1 0
8 812 days 0 0
1 844 days 0 0
10 1071 days 0 0
9 1147 days 0 0
6 1349 days 0 0
3 1375 days 0 0
2 1384 days 0 1
7 1687 days 0 0
orderedData$GroupColor[orderedData$TreatmentGroup==0] <- "yellow"
orderedData$GroupColor[orderedData$TreatmentGroup==1] <- "red"
orderedData$YCoord <- barplot(as.numeric(orderedData$OS), horiz=TRUE, col=orderedData$GroupColor, xlim=c(0,max(orderedData$OS) + 50), xlab="Overall Survival")
points(x=20+as.numeric(orderedData$OS), y=orderedData$YCoord,pch=62, col="green")
legend(1000,2, c("Control", "Treatment", "still living"), col=c("yellow","red", "green"), lty=1, lwd=c(10,10,0),pch=62)
This gets me close enough for now, but aesthetics are not perfect. If there is a package or a better solution someone can suggest I'd love to see it!
You asked for an "easy" way to generate a swimmer plot. This is probably a bit more involved than you were hoping for, but it's pretty close to what you posted. If you need to make a lot of swimmer plots, you can tweak this into something that works for you and then turn it into a function.
First create some fake data:
library(ggplot2)
library(reshape2)
library(dplyr)
library(grid)
set.seed(33)
dat = data.frame(Subject = 1:10,
Months = sample(4:20, 10, replace=TRUE),
Treated=sample(0:1, 10, replace=TRUE),
Stage = sample(1:4, 10, replace=TRUE),
Continued=sample(0:1, 10, replace=TRUE))
dat = dat %>%
group_by(Subject) %>%
mutate(Complete=sample(c(4:(max(Months)-1),NA), 1,
prob=c(rep(1, length(4:(max(Months)-1))),5), replace=TRUE),
Partial=sample(c(4:(max(Months)-1),NA), 1,
prob=c(rep(1, length(4:(max(Months)-1))),5), replace=TRUE),
Durable=sample(c(-0.5,NA), 1, replace=TRUE))
# Order Subjects by Months
dat$Subject = factor(dat$Subject, levels=dat$Subject[order(dat$Months)])
# Melt part of data frame for adding points to bars
dat.m = melt(dat %>% select(Subject, Months, Complete, Partial, Durable),
id.var=c("Subject","Months"))
Now for the plot:
ggplot(dat, aes(Subject, Months)) +
geom_bar(stat="identity", aes(fill=factor(Stage)), width=0.7) +
geom_point(data=dat.m,
aes(Subject, value, colour=variable, shape=variable), size=4) +
geom_segment(data=dat %>% filter(Continued==1),
aes(x=Subject, xend=Subject, y=Months + 0.1, yend=Months + 1),
pch=15, size=0.8, arrow=arrow(type="closed", length=unit(0.1,"in"))) +
coord_flip() +
scale_fill_manual(values=hcl(seq(15,375,length.out=5)[1:4],100,70)) +
scale_colour_manual(values=c(hcl(seq(15,375,length.out=3)[1:2],100,40),"black")) +
scale_y_continuous(limits=c(-1,20), breaks=0:20) +
labs(fill="Disease Stage", colour="", shape="",
x="Subject Recevied Study Drug") +
theme_bw() +
theme(panel.grid.minor=element_blank(),
panel.grid.major=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())