I'm trying to build a custom stat function with ggplot2 wherein I would like to access a discrete variable to compute a statistic with per group. However, the default behaviour of ggplot layers is to automatically assign implicit groups to any discrete variables (mostly). This means that my data gets split up over an automatic grouping, which I wouldn't want.
I can show this as follows; I have a pretty standard constructor:
library(ggplot2)
stat_example <- function(
mapping = NULL,
data = NULL,
geom = "point",
position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
) {
layer(data = data,
mapping = mapping,
stat = StatExample,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm))
}
And I have a Stat ggproto object that simply passes along the data, but prints the head of the data for illustration purposes. I've called the bit that I'm interested in for computing an actual stat value
here.
StatExample <- ggproto(
"StatExample",
Stat,
required_aes = c("x", "y", "value"),
default_aes = aes(x = after_stat(x), y = after_stat(y)),
compute_group = function(data, scales) {
print(head(data, 2))
data
}
)
Now if I construct a plot with this stat, we can see what goes into the compute_group()
function as data
.
g <- ggplot(iris) +
stat_example(aes(Sepal.Width, Sepal.Length, value = Species))
# To get only the print side-effect, not the plot (which looks normal)
g <- ggplotGrob(g)
#> x y value PANEL group
#> 1 3.5 5.1 setosa 1 1
#> 2 3.0 4.9 setosa 1 1
#> x y value PANEL group
#> 51 3.2 7.0 versicolor 1 2
#> 52 3.2 6.4 versicolor 1 2
#> x y value PANEL group
#> 101 3.3 6.3 virginica 1 3
#> 102 2.7 5.8 virginica 1 3
Created on 2020-05-28 by the reprex package (v0.3.0)
I would like to have 1 data.frame containing all the data for this case. We see above that we've printed 3 data.frames with different group
variables, meaning that the data has been split into 3 groups. What I think it would take to get there, is to have the value
variable escape the automatic group detection.
I've considered the following points:
-1
, which is the standard 'no group'-group. However, when I do this, the data will not get automatically grouped when for example aes(colour = some_variable)
. This I definitively want to happen.ggplot2:::add_group()
function, it seems I can escape the autogrouping by naming my value
variable label
, however this would make the stat incompatible with geom_text()
and it doesn't describe the meaning of value
naturally.layer()
call with a variant of this function, that would make a different Layer ggproto object wherein compute_aesthetics()
works out groups differently. This however is a lot of work I would rather prevent to be burdened with.vctrs::new_vctr(..., class = "not_discrete")
, but where is the appropriate place to wrap my value
variable in that class?Helpful suggestions are welcome, or new takes on 'just use label
' arguments too.
If this is an occasional use case, a simple (albeit manual) hack could be running trace(ggplot2:::add_group, edit = TRUE)
and add "value"
alongside "label", "PANEL"
as variable names to be excluded from automatic group detection.
A less manual (but probably more fragile) way to achieve the same effect would involve the following steps:
add_group
function with the above modification;Layer
ggproto object that uses the modified add_group
in its compute_aesthetics
function;# define modified add_group function
add_group2 <- function (data) {
if (ggplot2:::empty(data))
return(data)
if (is.null(data$group)) {
disc <- vapply(data, ggplot2:::is.discrete, logical(1))
disc[names(disc) %in% c("label", "PANEL", "value")] <- FALSE # change here
if (any(disc)) {
data$group <- vctrs::vec_group_id(data[disc])
}
else {
data$group <- ggplot2:::NO_GROUP
}
} else {
data$group <- vctrs::vec_group_id(data["group"])
}
data
}
# define modified compute_aesthetics function that uses modified add_group in second last line
compute_aesthetics_alt <- .subset2(ggplot2:::Layer, "compute_aesthetics")
body(compute_aesthetics_alt)[[length(body(compute_aesthetics_alt)) - 1]] <-
quote(evaled <- add_group2(evaled))
# define modified Layer ggproto object that uses alternative compute_aesthetics
Layer2 <- ggproto("Layer2",
ggplot2:::Layer,
compute_aesthetics = compute_aesthetics_alt)
# define modified stat with Layer2 specified as its layer_class
stat_example <- function(
mapping = NULL,
data = NULL,
geom = "point",
position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
) {
layer(data = data,
mapping = mapping,
stat = StatExample,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm),
layer_class = Layer2) # change here
}
Usage:
# add new column to simulate different colour
iris$gg <- sample(c("a", "b"), size = nrow(iris), replace = TRUE)
ggplot(iris) +
stat_example(aes(Sepal.Width, Sepal.Length,
value = Species))
# prints one data frame, because there's only one group by default
ggplot(iris) +
stat_example(aes(Sepal.Width, Sepal.Length,
value = Species, colour = gg))
# prints two data frames, because grouping is based on the colour aesthetic,
# which has two possible values