I have a dataset that presents a few challenges for transformation in preparation for creating a dumbbell plot:
h_sequ
is 1
.h_sequ
values of 1
and 2
. An example of this is group 12.h_sequ
takes values 1, 2, and 3, such as group 33.h_sequ
has values of 1, 1, 2, 3
. group h_sequ date
<int> <int> <date>
1 1 1 2012-03-27
2 1 1 2012-03-27
3 10 1 2016-10-25
4 10 1 2016-10-25
5 12 1 2021-06-25
6 12 2 2022-05-18
7 31 1 2019-11-28
8 31 1 2019-11-28
9 31 2 2021-03-24
10 33 1 2013-09-03
11 33 1 2013-09-03
12 33 2 2019-01-04
13 33 3 2020-07-28
14 35 1 2015-10-21
15 35 2 2017-06-28
data <- structure(list(group = c(1L, 1L, 10L, 10L, 12L, 12L, 31L, 31L,
31L, 33L, 33L, 33L, 33L, 35L, 35L), h_sequ = c(1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 3L, 1L, 2L), date = structure(c(15426,
15426, 17099, 17099, 18803, 19130, 18228, 18228, 18710, 15951,
15951, 17900, 18471, 16729, 17345), class = "Date")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -15L))
The main question is how to implement the logic for the date column to accommodate these scenarios in a combined dumbbell plot. So far, I have used summarization to get the minimum and maximum dates for each group, but I need to integrate this approach with the specific structure of my data, taking into account the varying number of dates per group.
So far I have this:
library(ggplot2)
library(ggalt)
library(dplyr)
data %>%
summarise(start_date = min(date), end_date = max(date), .by = group) %>%
ggplot(aes(x = start_date, xend = end_date, y = group)) +
geom_dumbbell(color = "red3", size = 3)
I would probably manually dodge the co-occurring points, and join the points with geom_path
. This allows a complete display of all your data.
library(tidyverse)
data %>%
mutate(group = factor(group)) %>%
mutate(dodge = (row_number() - median(row_number()))/n()/3.2,
.by = c(group, date)) %>%
ggplot(aes(date, group)) +
geom_path(linewidth = 3, color = "gray") +
geom_point(aes(y = as.numeric(group) + dodge, fill = factor(h_sequ)),
shape = 21, size = 5) +
scale_fill_manual("h_sequ", values = c("orange", "deepskyblue4", "red4")) +
theme_minimal(base_size = 16)