I have the following dataset in R. I aim to do a ggplot where the scale goes from 1 to 12 (January, February, ..., December) in the x-axis, and the y-axis goes from 1 to 6 ( num_months variable [in the example only 1 and 6]). Then, I want to use geom_segment(), with the minimum being start_month and the maximum being end_month (so they represent the num_months). I want to facet horizontally by the variable year.
My main problems, so far, are:
data <- read_csv("num_months,start_month_year,end_month_year,B1,B1_p,year,start_month,end_month
1,6,6,3.3571016788482666,0.007681768853217363,2021,5,5
1,8,8,2.548985481262207,0.007373321335762739,2021,7,7
1,10,10,2.139772415161133,0.03452971577644348,2021,9,9
1,12,12,2.165775775909424,0.07796278595924377,2021,11,11
1,13,13,1.9506219625473022,0.09215697646141052,2021,12,12
1,23,23,2.7839596271514893,0.011407249607145786,2022,10,10
1,25,25,2.220555543899536,0.06181173026561737,2022,12,12
6,6,11,0.9881601333618164,0.08719704300165176,2021,5,10
6,8,13,1.438501238822937,0.032221969217061996,2021,7,12
6,9,14,1.16400945186615,0.09187468141317368,2021,8,1
6,10,15,1.5834165811538696,0.03494146466255188,2021,9,2
6,11,16,1.294316291809082,0.09792502969503403,2021,10,3
6,12,17,1.4204859733581543,0.0546354204416275,2021,11,4
6,20,25,1.07038414478302,0.0722803920507431,2022,7,12") %>%
mutate(
end_month = ifelse(start_month == end_month, end_month + 1, end_month),
end_month = ifelse(end_month > 12, 1, end_month) # Wrap around to January if end_month exceeds 12
) %>%
group_by(year, num_months) %>%
mutate(
y_pos = num_months + (row_number() - 1) * 0.2 # Adding a systematic offset to y position
) %>%
ungroup()
# Create the boxes for num_months
boxes <- data %>%
group_by(year, num_months) %>%
summarise(
ymin = min(y_pos) - 0.3,
ymax = max(y_pos) + 0.3
) %>%
ungroup()
# Create the ggplot
p <- ggplot(data) +
geom_rect(data = boxes, aes(xmin = 0.5, xmax = 12.5, ymin = ymin, ymax = ymax), fill = NA, color = "grey") +
geom_segment(aes(x = start_month, xend = end_month, y = y_pos, yend = y_pos, color = as.factor(num_months)), size = 1) +
scale_x_continuous(breaks = 1:12, limits = c(0.5, 12.5), labels = month.abb) +
scale_y_continuous(breaks = 1:6, limits = c(0.5, 6.5), expand = expansion(mult = c(0.02, 0.1))) + # Adjusting y-axis limits to accommodate offset
facet_wrap(~ year) +
labs(x = "Month", y = "Number of Months", color = "Number of Months") +
theme_minimal() +
theme(panel.spacing = unit(1, "lines")) # Increase spacing between panels
print(p)
Here's how it looks: segments that last the same number of months overlap. Also, lines go into the panel for different num_months.
Here's my suggestion. The big changes are:
I use actual dates, not numbers, for the x-axis, with the start date at the start of the month and the end date the last day of the month. This makes the segments "occupy the full months".
Since you want the num_months
to look "more like panels", I include them in the faceting. (Note that you can facet by more than one variable in the rows, so if you also want to facet country
by rows you can do that too, see the "Margins" example at the bottom of the ?facet_grid
help page.)
Since we have faceted by num_months
, this lets us use the grouped row numbers as the y
aesthetic, evenly spacing the lines regardless of how many there are.
Since theme_minimal()
doesn't draw panels for its facets, I switched to theme_bw()
, but you can of course customize the theming however you want.
data <- read_csv("num_months,start_month_year,end_month_year,B1,B1_p,year,start_month,end_month
1,6,6,3.3571016788482666,0.007681768853217363,2021,5,5
1,8,8,2.548985481262207,0.007373321335762739,2021,7,7
1,10,10,2.139772415161133,0.03452971577644348,2021,9,9
1,12,12,2.165775775909424,0.07796278595924377,2021,11,11
1,13,13,1.9506219625473022,0.09215697646141052,2021,12,12
1,23,23,2.7839596271514893,0.011407249607145786,2022,10,10
1,25,25,2.220555543899536,0.06181173026561737,2022,12,12
6,6,11,0.9881601333618164,0.08719704300165176,2021,5,10
6,8,13,1.438501238822937,0.032221969217061996,2021,7,12
6,9,14,1.16400945186615,0.09187468141317368,2021,8,1
6,10,15,1.5834165811538696,0.03494146466255188,2021,9,2
6,11,16,1.294316291809082,0.09792502969503403,2021,10,3
6,12,17,1.4204859733581543,0.0546354204416275,2021,11,4
6,20,25,1.07038414478302,0.0722803920507431,2022,7,12") %>%
mutate(
start_dt = ymd(paste("2023", start_month, "01", sep = "-")),
end_dt = ceiling_date(ymd(paste("2023", end_month, "01", sep = "-")), unit = "month") - 1,
end_month = ifelse(start_month == end_month, end_month + 1, end_month),
end_month = ifelse(end_month > 12, 1, end_month) # Wrap around to January if end_month exceeds 12
) %>%
group_by() %>%
mutate(
y_pos = num_months + (row_number() - 1) * 0.2, # Adding a systematic offset to y position
yy = row_number(),
.by = c(year, num_months)
)
ggplot(data) +
geom_segment(aes(x = start_dt, xend = end_dt, y = yy, yend = yy, color = factor(num_months)), size = 1) +
scale_x_date(
date_labels = "%b",
date_breaks = "1 month",
limits = ymd(c("2023-01-01", "2023-12-31")),
expand = expansion(0, 0)
) +
scale_y_continuous(labels = NULL) +
facet_grid(rows = vars(num_months), cols = vars(year), space = "free_y", scales = "free_y") +
labs(x = "Month", y = "Number of Months", color = "Number of Months") +
theme_bw() +
theme(
panel.spacing = unit(1, "lines"), # Increase spacing between panels
panel.grid.major.y = element_blank(),
axis.ticks.y = element_blank()
)