rggplot2plotline-plot

Line plot displaying sequence of events by age using ggplot2


Consider the df below in which there are different sequences of events and the individuals age when they occur (A:E). If an event does not occur, the correspondent column receives NA. The sequence proportion is given in the column prop.

df <- data.frame(seq_id = seq(1,5),
                 seq = as.character(c("A>B>D", "A>B", "A>D>B", "A>C>E", "A>B>E")),
                 prop = sample(seq(5,50), 5, replace=T),
                 A = c(41,38,60,45,47),
                 B = c(42,40,68,NA,52),
                 C = c(NA,NA,NA,50,NA),
                 D = c(45,NA,62,NA,NA),
                 E = c(NA,NA,NA,78,80)
                 )

df
  seq_id   seq prop  A  B  C  D  E
1      1 A>B>D   13 41 42 NA 45 NA
2      2   A>B    8 38 40 NA NA NA
3      3 A>D>B   35 60 68 NA 62 NA
4      4 A>C>E    5 45 NA 50 NA 78
5      5 A>B>E   37 47 52 NA NA 80

I want to obtain a plot similar to the one below (https://doi.org/10.1016/j.jbi.2018.11.002), where only the events that occur is displayed in each trajectory line:

enter image description here

I appreciate your help on this.


Solution

  • To achieve your desired result the first step is to reshape to long, then you can create your chart using a geom_point and a geom_path. The rest is to add the labels and adding more or less styling or sugar.

    Note: Instead of the sequence id I opted to put the sequence on the y axis.

    library(tidyverse)
    
    df_long <- df |>
      pivot_longer(-c(1:3), names_to = "event") |>
      filter(!is.na(value)) |> 
      arrange(seq_id, value) |> 
      mutate(seq = fct_inorder(seq))
    
    df_long |>
      ggplot(aes(value, seq, color = event)) +
      geom_path(
        aes(group = seq),
        arrow = arrow(length = unit(5, "pt"))
      ) +
      geom_point() +
      geom_label(aes(label = event),
        vjust = 1, fill = NA, label.size = 0,
        label.padding = unit(8, "pt"),
        color = "black"
      ) +
      geom_label(aes(label = value),
        vjust = 0, fill = NA, label.size = 0,
        label.padding = unit(8, "pt"),
        color = "black"
      ) +
      geom_text(
        data = df,
        aes(label = scales::percent(prop, scale = 1)),
        x = 90,
        color = "black"
      ) +
      scale_x_continuous(expand = c(0.05, 0, 0.05, 10)) +
      scale_color_brewer(type = "qual", palette = 6) +
      guides(color = "none") +
      theme_bw() +
      theme(
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()
      ) +
      labs(
        x = "Age",
        y = "Sequence"
      )