Consider the df
below in which there are different sequences
of events
and the individuals age when they occur (A:E
). If an event
does not occur, the correspondent column
receives NA
. The sequence
proportion
is given in the column
prop
.
df <- data.frame(seq_id = seq(1,5),
seq = as.character(c("A>B>D", "A>B", "A>D>B", "A>C>E", "A>B>E")),
prop = sample(seq(5,50), 5, replace=T),
A = c(41,38,60,45,47),
B = c(42,40,68,NA,52),
C = c(NA,NA,NA,50,NA),
D = c(45,NA,62,NA,NA),
E = c(NA,NA,NA,78,80)
)
df
seq_id seq prop A B C D E
1 1 A>B>D 13 41 42 NA 45 NA
2 2 A>B 8 38 40 NA NA NA
3 3 A>D>B 35 60 68 NA 62 NA
4 4 A>C>E 5 45 NA 50 NA 78
5 5 A>B>E 37 47 52 NA NA 80
I want to obtain a plot
similar to the one below (https://doi.org/10.1016/j.jbi.2018.11.002), where only the events
that occur is displayed in each trajectory line
:
I appreciate your help on this.
To achieve your desired result the first step is to reshape to long, then you can create your chart using a geom_point
and a geom_path
. The rest is to add the labels and adding more or less styling or sugar.
Note: Instead of the sequence id I opted to put the sequence on the y axis.
library(tidyverse)
df_long <- df |>
pivot_longer(-c(1:3), names_to = "event") |>
filter(!is.na(value)) |>
arrange(seq_id, value) |>
mutate(seq = fct_inorder(seq))
df_long |>
ggplot(aes(value, seq, color = event)) +
geom_path(
aes(group = seq),
arrow = arrow(length = unit(5, "pt"))
) +
geom_point() +
geom_label(aes(label = event),
vjust = 1, fill = NA, label.size = 0,
label.padding = unit(8, "pt"),
color = "black"
) +
geom_label(aes(label = value),
vjust = 0, fill = NA, label.size = 0,
label.padding = unit(8, "pt"),
color = "black"
) +
geom_text(
data = df,
aes(label = scales::percent(prop, scale = 1)),
x = 90,
color = "black"
) +
scale_x_continuous(expand = c(0.05, 0, 0.05, 10)) +
scale_color_brewer(type = "qual", palette = 6) +
guides(color = "none") +
theme_bw() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()
) +
labs(
x = "Age",
y = "Sequence"
)