I am using the package ggh4x
and the following set and code to create a boxplot with a nested relation between two categorical variables.
Data used
set1 <- structure(list(Tx = c("Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed",
"Not Exposed", "Not Exposed", "Exposed", "Exposed", "Exposed", "Exposed", "Exposed",
"Exposed", "Exposed", "Exposed", "Exposed", "Exposed", "Not Exposed", "Not Exposed",
"Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Not Exposed", "Exposed", "Exposed",
"Exposed", "Exposed", "Exposed", "Exposed", "Exposed", "Exposed",
"Exposed", "Exposed"), Species = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), levels = c("Species1", "Species2"), class = "factor"), Size = c(88.5,
83.3, 59.5, 78, 50.3, 57, 78.2, 59, 85, 59.5, 13.1, 50.1, 55,
60.1, 13.8, 27, 57.1, 53.1, 42, 16, 88.8, 26.2, 62, 108.5, 92.3,
74.4, 77.3, 96, 88.7, 77.8, 50.7, 61.9, 65.1, 63.5, 64, 88.6,
53.8, 82.1, 78.8, 75.6)), row.names = c(NA, -40L), class = c("tbl_df",
"tbl", "data.frame"))
Nested boxplot
library(ggplot2)
library(ggh4x)
ggplot(set1, aes(x=interaction(Tx, Species), y=Size)) +
stat_boxplot(geom="errorbar", width = 0.15) +
geom_boxplot(show.legend=FALSE, outlier.shape = NA, aes(fill = interaction(Tx, Species))) +
geom_jitter(width = 0.1, shape=21, colour="black", fill="grey95", stroke=0.5, size=1) +
guides(x="axis_nested") +
theme_classic() +
theme(axis.title = element_text(face="bold"),
text = element_text(family = "serif", size = 12.5))
Right now, the nested relation is displayed on the x-axis just like I wanted. However, the order of the groups is alphabetically, and I'd like to select it myself (with the "Not Exposed" group before "Exposed").
I tried doing it with weave_factors()
instead of interaction()
, but then the plot doesn't display the nested relation correctly.
Is there an existing method to selectively reorder the groups ?
In 99.9% of questions related to the (re-)ordering of axes, facets or legends the answer is always the same:
Convert your variable()s to
factor
(s) with the order of thelevels
set according to the desired order.
While there is an option to achieve your desired result using weave_factors
, it depends on the order of the data (and some additional changes, see below), and hence I think the more robust approach to make Not Exposed
the first category is to use
set1$Tx <- factor(set1$Tx, levels = c("Not Exposed", "Exposed"))
or relevel
as in the answer by @StephanLaurent or depending on the desired order one of the several convenience functions in the forcats
package.
However, when doing so you have to use interaction
to get the desired nested axis (as in all examples in the docs, see ?guide_axis_nested
).
library(ggplot2)
library(ggh4x)
set1$Tx <- factor(set1$Tx, levels = c("Not Exposed", "Exposed"))
ggplot(set1, aes(x = interaction(Tx, Species), y = Size)) +
stat_boxplot(geom = "errorbar", width = 0.15) +
geom_boxplot(
show.legend = FALSE, outlier.shape = NA,
aes(fill = interaction(Tx, Species))
) +
geom_jitter(
width = 0.1, shape = 21, colour = "black",
fill = "grey95", stroke = 0.5, size = 1
) +
guides(x = "axis_nested") +
theme_classic() +
theme(
axis.title = element_text(face = "bold"),
text = element_text(family = "serif", size = 12.5)
)
However, for your (example) data and accounting for how weave_factors
works and differs from interaction
(see below) you could actually achieve your desired result without converting to a factor by switching the order in which you pass Species
and Tx
to weave_factors
and by using the more verbose guide_axis_nested()
with inv=TRUE
:
library(ggplot2)
library(ggh4x)
# Just to ensure that Tx is a non-factor
set1$Tx <- as.character(set1$Tx)
ggplot(set1, aes(x = weave_factors(Species, Tx), y = Size)) +
stat_boxplot(geom = "errorbar", width = 0.15) +
geom_boxplot(
show.legend = FALSE, outlier.shape = NA,
aes(fill = weave_factors(Species, Tx))
) +
geom_jitter(
width = 0.1, shape = 21, colour = "black",
fill = "grey95", stroke = 0.5, size = 1
) +
guides(x = guide_axis_nested(inv = TRUE)) +
theme_classic() +
theme(
axis.title = element_text(face = "bold"),
text = element_text(family = "serif", size = 12.5)
)
weave_factors
vs. interaction
:weave_factors
differs from interaction
in two respects (see ? weave_factors
:
it orders the new levels such that the levels of the first input variable is given priority over the second input.
it treats non-factor inputs as if their levels were unique(as.character(x))
, i.e. the levels are set in the order as in the data (similar to what forcats::fct_inorder
does)
For that reason weave_factors
gives IMHO a more natural ordering of the levels of the combined factors
weave_factors(set1$Tx, set1$Species) |> levels()
#> [1] "Not Exposed.Species1" "Not Exposed.Species2" "Exposed.Species1"
#> [4] "Exposed.Species2"
i.e. the levels of combined factor are "ordered" first by the first input, then the second whereas with interaction
it's the other way around:
interaction(set1$Tx, set1$Species) |> levels()
#> [1] "Not Exposed.Species1" "Exposed.Species1" "Not Exposed.Species2"
#> [4] "Exposed.Species2"