rggplot2ggproto

Escape 'discrete aesthetic implies group' in custom stat


I'm trying to build a custom stat function with ggplot2 wherein I would like to access a discrete variable to compute a statistic with per group. However, the default behaviour of ggplot layers is to automatically assign implicit groups to any discrete variables (mostly). This means that my data gets split up over an automatic grouping, which I wouldn't want.

I can show this as follows; I have a pretty standard constructor:

library(ggplot2)

stat_example <- function(
  mapping = NULL,
  data = NULL,
  geom = "point",
  position = "identity",
  ...,
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE
) {
  layer(data = data,
        mapping = mapping,
        stat = StatExample,
        geom = geom,
        position = position,
        show.legend = show.legend,
        inherit.aes = inherit.aes,
        params = list(na.rm = na.rm))
}

And I have a Stat ggproto object that simply passes along the data, but prints the head of the data for illustration purposes. I've called the bit that I'm interested in for computing an actual stat value here.

StatExample <- ggproto(
  "StatExample",
  Stat,
  required_aes = c("x", "y", "value"),
  default_aes = aes(x = after_stat(x), y = after_stat(y)),
  compute_group = function(data, scales) {
    print(head(data, 2))
    data
  }
)

Now if I construct a plot with this stat, we can see what goes into the compute_group() function as data.

g <- ggplot(iris) +
  stat_example(aes(Sepal.Width, Sepal.Length, value = Species))

# To get only the print side-effect, not the plot (which looks normal)
g <- ggplotGrob(g)
#>     x   y  value PANEL group
#> 1 3.5 5.1 setosa     1     1
#> 2 3.0 4.9 setosa     1     1
#>      x   y      value PANEL group
#> 51 3.2 7.0 versicolor     1     2
#> 52 3.2 6.4 versicolor     1     2
#>       x   y     value PANEL group
#> 101 3.3 6.3 virginica     1     3
#> 102 2.7 5.8 virginica     1     3

Created on 2020-05-28 by the reprex package (v0.3.0)

I would like to have 1 data.frame containing all the data for this case. We see above that we've printed 3 data.frames with different group variables, meaning that the data has been split into 3 groups. What I think it would take to get there, is to have the value variable escape the automatic group detection.

I've considered the following points:

Helpful suggestions are welcome, or new takes on 'just use label' arguments too.


Solution

  • If this is an occasional use case, a simple (albeit manual) hack could be running trace(ggplot2:::add_group, edit = TRUE) and add "value" alongside "label", "PANEL" as variable names to be excluded from automatic group detection.

    A less manual (but probably more fragile) way to achieve the same effect would involve the following steps:

    1. Define a modified version of the add_group function with the above modification;
    2. Define a modified version of the Layer ggproto object that uses the modified add_group in its compute_aesthetics function;
    3. Point the custom stat to the modified layer.
    # define modified add_group function
    add_group2 <- function (data) {
      if (ggplot2:::empty(data)) 
        return(data)
      if (is.null(data$group)) {
        disc <- vapply(data, ggplot2:::is.discrete, logical(1))
        disc[names(disc) %in% c("label", "PANEL", "value")] <- FALSE         # change here
        if (any(disc)) {
          data$group <- vctrs::vec_group_id(data[disc])
        }
        else {
          data$group <- ggplot2:::NO_GROUP
        }
      } else {
        data$group <- vctrs::vec_group_id(data["group"])
      }
      data
    }
    
    # define modified compute_aesthetics function that uses modified add_group in second last line
    compute_aesthetics_alt <- .subset2(ggplot2:::Layer, "compute_aesthetics")
    body(compute_aesthetics_alt)[[length(body(compute_aesthetics_alt)) - 1]] <- 
      quote(evaled <- add_group2(evaled))
    
    # define modified Layer ggproto object that uses alternative compute_aesthetics
    Layer2 <- ggproto("Layer2",
                      ggplot2:::Layer,
                      compute_aesthetics = compute_aesthetics_alt)
    
    # define modified stat with Layer2 specified as its layer_class
    stat_example <- function(
      mapping = NULL,
      data = NULL,
      geom = "point",
      position = "identity",
      ...,
      na.rm = FALSE,
      show.legend = NA,
      inherit.aes = TRUE
    ) {
      layer(data = data,
            mapping = mapping,
            stat = StatExample,
            geom = geom,
            position = position,
            show.legend = show.legend,
            inherit.aes = inherit.aes,
            params = list(na.rm = na.rm),
            layer_class = Layer2) # change here
    }
    

    Usage:

    # add new column to simulate different colour
    iris$gg <- sample(c("a", "b"), size = nrow(iris), replace = TRUE) 
    
    ggplot(iris) + 
      stat_example(aes(Sepal.Width, Sepal.Length,
                       value = Species))
    # prints one data frame, because there's only one group by default
    
    ggplot(iris) + 
      stat_example(aes(Sepal.Width, Sepal.Length,
                       value = Species, colour = gg))
    # prints two data frames, because grouping is based on the colour aesthetic,
    # which has two possible values