I'd like to get the same order of boxplot fills among temporal groups and across facets using ggplot2 in R. Boxes should be drawn along a continuous x-axis, and box.width should scale with number of values within each group as it is provided by position_dodge2()
.
Note here, that sometimes the first box within a temporal group (marked by vertical lines) is blue, sometimes red.
Making the variable used for coloring to a factor, does not help. Also it does not seem like the first coloring value within a temporal group, nor the its frequency are responsible for which "color" comes first. Otherwise position_dodge2() does a very good job here, producing the boxes exactly like I want.
Minimal example:
library(ggplot2)
time_data = data.frame( time = c(1:100),
y.var = rep(seq(0,1,.02),2)[1:100],
f.var = rep(c("A","B","C","D"),25),
time.group = c ( rep(c("q"),10),
rep(c("r"),35),
rep(c("s"),5),
rep(c("t"),30),
rep(c("u"),20)
),
col.group = rep(c(T,F,T),40)[1:100]
)
break.time = time_data$time[ which( time_data$time.group != lead(time_data$time.group) )]
ggplot()+
facet_grid(f.var ~.)+
geom_boxplot(data = time_data, aes( x = time, y = y.var,
fill = col.group,
group = paste(time.group, col.group)))+
geom_vline(xintercept = break.time)
Thanks for any help.
The issue is that the position for your box plots it determined by the time
values for each boxplot, i.e. only in the cases where mean(time)
is the same for both col.group
s in a time.group
will dodging have an effect. Otherwise the position is determined by mean(time)
.
To make this visible I added a stat_boxplot
using a geom="text"
. From this you can see that "order" of the box plots is determined by mean(time)
:
ggplot() +
facet_grid(f.var ~ .) +
geom_boxplot(data = time_data, aes(
x = time, y = y.var,
fill = col.group,
group = paste(time.group, col.group)
)) +
stat_boxplot(
data = time_data, geom = "text",
aes(
label = after_stat(x),
x = time,
y = stage(y.var, after_stat = 0),
group = paste(time.group, col.group)
),
vjust = 0,
position = position_dodge2(.75)
) +
geom_vline(xintercept = break.time)
This said, one option to achieve your desired result would be to make the x
positions the same for both col.group
s per time.group
which could be achieved using stage()
and an after_stat=
calculation using e.g. ave()
. As a first step we could get the order right per time group by computing the position per group
.
library(ggplot2)
ggplot() +
facet_grid(f.var ~ .) +
geom_boxplot(data = time_data, aes(
x = stage(time, after_stat = ave(x, group, FUN = mean)),
y = y.var,
fill = col.group,
group = paste(col.group, time.group)
)) +
geom_vline(xintercept = break.time) +
scale_color_manual(values = "black", guide = "none")
However, getting the same order of the col.group
s for all time groups requires even more effort. The issue is that we need to ensure that the information on the time group is part of the dataset after the stat has been applied. The only way I figured out to achieve that was once again using stage
to map the time group on the color
aes, then setting color
to a constant value and replacing with the default "black"
and getting rid of the color legend using scale_color_manual
:
ggplot() +
facet_grid(f.var ~ .) +
geom_boxplot(data = time_data, aes(
x = stage(time, after_stat = ave(x, color, FUN = mean)),
y = y.var,
fill = col.group,
color = stage(time.group, after_stat = "1"),
group = paste(col.group, time.group)
)) +
geom_vline(xintercept = break.time) +
scale_color_manual(values = "black", guide = "none")