
Line plot displaying sequence of events by age using ggplot2

Consider the df below in which there are different sequences of events and the individuals age when they occur (A:E). If an event does not occur, the correspondent column receives NA. The sequence proportion is given in the column prop.

df <- data.frame(seq_id = seq(1,5),
                 seq = as.character(c("A>B>D", "A>B", "A>D>B", "A>C>E", "A>B>E")),
                 prop = sample(seq(5,50), 5, replace=T),
                 A = c(41,38,60,45,47),
                 B = c(42,40,68,NA,52),
                 C = c(NA,NA,NA,50,NA),
                 D = c(45,NA,62,NA,NA),
                 E = c(NA,NA,NA,78,80)

  seq_id   seq prop  A  B  C  D  E
1      1 A>B>D   13 41 42 NA 45 NA
2      2   A>B    8 38 40 NA NA NA
3      3 A>D>B   35 60 68 NA 62 NA
4      4 A>C>E    5 45 NA 50 NA 78
5      5 A>B>E   37 47 52 NA NA 80

I want to obtain a plot similar to the one below (https://doi.org/10.1016/j.jbi.2018.11.002), where only the events that occur is displayed in each trajectory line:

enter image description here

I appreciate your help on this.


  • To achieve your desired result the first step is to reshape to long, then you can create your chart using a geom_point and a geom_path. The rest is to add the labels and adding more or less styling or sugar.

    Note: Instead of the sequence id I opted to put the sequence on the y axis.

    df_long <- df |>
      pivot_longer(-c(1:3), names_to = "event") |>
      filter(!is.na(value)) |> 
      arrange(seq_id, value) |> 
      mutate(seq = fct_inorder(seq))
    df_long |>
      ggplot(aes(value, seq, color = event)) +
        aes(group = seq),
        arrow = arrow(length = unit(5, "pt"))
      ) +
      geom_point() +
      geom_label(aes(label = event),
        vjust = 1, fill = NA, label.size = 0,
        label.padding = unit(8, "pt"),
        color = "black"
      ) +
      geom_label(aes(label = value),
        vjust = 0, fill = NA, label.size = 0,
        label.padding = unit(8, "pt"),
        color = "black"
      ) +
        data = df,
        aes(label = scales::percent(prop, scale = 1)),
        x = 90,
        color = "black"
      ) +
      scale_x_continuous(expand = c(0.05, 0, 0.05, 10)) +
      scale_color_brewer(type = "qual", palette = 6) +
      guides(color = "none") +
      theme_bw() +
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank()
      ) +
        x = "Age",
        y = "Sequence"