I have some data where every person is either "satisfied" or "dissatisfied". Then, every person also has two types of calculated distances. I have no issues plotting the boxplot. However, I cannot figure out how to plot a line between the two medians of the boxplots.
Reproducible Code:
group1.ids <- c('A123', 'B123', 'C123', 'D123')
Dis.data <- data.frame(id = rep(group1.ids),
satisfaction = rep('Dissatisfied', 8),
DistType = c(rep('A', 4),
rep('B', 4)),
Dist = runif(8))
group2.ids <- c('E123', 'F123', 'G123', 'H123')
Sat.data <- data.frame(id = rep(group2.ids),
satisfaction = rep('Satisfied', 8),
DistType = c(rep('A', 4),
rep('B', 4)),
Dist = runif(8))
data <- rbind(Dis.data, Sat.data)
ggplot(data) +
geom_boxplot(mapping = aes(x = satisfaction, y = Dist, fill = DistType))
One way is the convert your x-axis to a continuous scale. To do this, we'll first factor
ize your satisfaction
and DistType
variables (you can control the order using this and levels=
if needed), and then we can use geom_line
to add your lines.
data2 <- transform(data, satisfaction = factor(satisfaction), DistType = factor(DistType))
medians <- aggregate(Dist ~ DistType + satisfaction, data2, FUN = median) |>
transform(x = as.numeric(satisfaction) + as.numeric(DistType)/3 - 0.5)
medians
# DistType satisfaction Dist x
# 1 A Dissatisfied 0.2042941 0.8333333
# 2 B Dissatisfied 0.5780955 1.1666667
# 3 A Satisfied 0.7128209 1.8333333
# 4 B Satisfied 0.6022990 2.1666667
ggplot(data2) +
geom_boxplot(mapping = aes(x = as.numeric(satisfaction), y = Dist, fill = DistType, group = interaction(satisfaction, DistType))) +
scale_x_continuous(name = "Satisfaction", breaks = seq_along(levels(data2$satisfaction)), labels = levels(data2$satisfaction)) +
geom_line(aes(x = x, y = Dist, group = satisfaction), data = medians)