Im having a large dataset with repeated measurements in long format for several IDs. It contains measurments of patients. Every measurement is recorded to a timepoint as well as a date which is stored in date variable. In addition I record whether or not the ID experience an "event". The time of the event is stored in a date variable. I'm drawing a plot for every single ID using ggplot2 of the measurements over time, and want to add a vertical line for when the "event" has happened. What i do is I first filter the data for the ID of which I want to draw the graph. Then I add the vline to the event date. However, when I add the vline, I get a line for every eventdate, even the IDs that are not filtered for in the analysis.
Here's is some sample data (In my real data there are alot more IDs)
library(tidyverse)
sampledata <- structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), Measure1 = c(10, 20, 0, 30, 20, 10, 2, 0, 0, 0), timepoint = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), time = structure(c(18628, 18748, 18840, 18932, 19024, 19205, 19297, 19024, 19113, 19205), class = "Date"), event = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), eventdate = structure(c(18779, 18779, 18779, 18779, 18779, 19024, 19024, 19024, 19024, 19024), class = "Date")), row.names = c(NA, 10L), class = "data.frame")
Here is the graph for ID 1:
filter(sampledata, ID %in% 1&Measure1 !="NA") %>% ggplot(aes(x = time, y = Measure1)) +
geom_line(size=0.3,linetype="solid") +
geom_point(size=2, color="#0073C2FF") +
geom_vline(xintercept = as.numeric(as.Date(sampledata$eventdate)), linetype=1) +
theme_gray() + theme(text = element_text(size=12), axis.text=element_text(size=8), legend.position="none", axis.title.y = element_blank()) +
labs(y="ylab", x = "Follow up") +
scale_x_date(date_labels = "%Y-%m-%d", date_breaks = "2 months")
As you can see, I get a vertical line for ID 1's eventdate (2021-06-01), but I also get a line for ID 2's eventdate (2022-02-01).
I guess I'm doing something wrong when filtering. Any idea as to how I can achieve the graph with only the vline for the selected ID? (My next step is to loop the graph so as to do the same graph for all the IDs so I do not want to hard code anything)
Thank you!
The issue is that you passed the eventdate
column from your unfiltered dataset sampledata
to xintercept
. Hence you get a vline for each eventdate in the unfiltered data.
To fix this use aesthetics, i.e. do aes(xintercept=eventdate)
. Additionally, even after doing so you are actually plotting multiple vlines as the events and event dates are duplicated. To fix this I use data = ~ distinct(.x, event, eventdate)
to filter the data for unique events and event dates.
library(tidyverse)
sampledata <- structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), Measure1 = c(10, 20, 0, 30, 20, 10, 2, 0, 0, 0), timepoint = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), time = structure(c(18628, 18748, 18840, 18932, 19024, 19205, 19297, 19024, 19113, 19205), class = "Date"), event = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), eventdate = structure(c(18779, 18779, 18779, 18779, 18779, 19024, 19024, 19024, 19024, 19024), class = "Date")), row.names = c(NA, 10L), class = "data.frame")
filter(sampledata, ID %in% 1 & Measure1 != "NA") %>%
ggplot(aes(x = time, y = Measure1)) +
geom_line(size = 0.3, linetype = "solid") +
geom_point(size = 2, color = "#0073C2FF") +
geom_vline(data = ~ distinct(.x, event, eventdate), aes(xintercept = eventdate), linetype = 1) +
theme_gray() +
theme(text = element_text(size = 12), axis.text = element_text(size = 8), legend.position = "none", axis.title.y = element_blank()) +
labs(y = "ylab", x = "Follow up") +
scale_x_date(date_labels = "%Y-%m-%d", date_breaks = "2 months")