Hi folks. I'm working on a ggplot2 extension that will implement a new faceting method.
I don't want to get into the nitty gritty of the algorithm, but suffice it to say that I need to first compute some new columns for each row of the input data
, and only then can I perform a compute_layout
and map_data
.
Of course, one option is to compute my new columns twice, once inside of compute_layout
and once again inside of map_data
, but this will be twice as expensive computationally, and just less elegant.
It seems that setup_params
and setup_data
are meant for this exact use case.
I'm creating a little reproducible example based off this great vignette.
I've just made a small modification that tries to add a hello
column to the data using the setup_data
function.
library(ggplot2)
facet_bootstrap <- function(n = 9, prop = 0.2, nrow = NULL, ncol = NULL,
scales = "fixed", shrink = TRUE, strip.position = "top") {
facet <- facet_wrap(~.bootstrap, nrow = nrow, ncol = ncol, scales = scales,
shrink = shrink, strip.position = strip.position)
facet$params$n <- n
facet$params$prop <- prop
ggproto(NULL, FacetBootstrap,
shrink = shrink,
params = facet$params
)
}
FacetBootstrap <- ggproto("FacetBootstrap", FacetWrap,
setup_data = function(data, params){
data[[1]]$hello <- 'world'
print("In SETUP_DATA:")
print(" names(data):")
print(names(data[[1]]))
print("")
data
},
compute_layout = function(data, params) {
id <- seq_len(params$n)
print("In COMPUTE_LAYOUT:")
print(" names(data):")
print(names(data[[1]]))
print("")
dims <- wrap_dims(params$n, params$nrow, params$ncol)
layout <- data.frame(PANEL = factor(id))
if (params$as.table) {
layout$ROW <- 1+as.integer((id - 1L) %/% dims[2] + 1L)
} else {
layout$ROW <- 1+as.integer(dims[1] - (id - 1L) %/% dims[2])
}
layout$COL <- 2+as.integer((id - 1L) %% dims[2] + 1L)
layout <- layout[order(layout$PANEL), , drop = FALSE]
rownames(layout) <- NULL
# Add scale identification
layout$SCALE_X <- if (params$free$x) id else 1L
layout$SCALE_Y <- if (params$free$y) id else 1L
cbind(layout, .bootstrap = id)
},
map_data = function(data, layout, params) {
print("In MAP_DATA:")
print(" names(data):")
print(names(data))
print("")
if (is.null(data) || nrow(data) == 0) {
return(cbind(data, PANEL = integer(0)))
}
n_samples <- round(nrow(data) * params$prop)
new_data <- lapply(seq_len(params$n), function(i) {
cbind(data[sample(nrow(data), n_samples), , drop = FALSE], PANEL = i)
})
do.call(rbind, new_data)
}
)
ggplot(diamonds, aes(carat, price)) +
geom_point(alpha = 0.1) +
facet_bootstrap(n = 9, prop = 0.05)
with outputs:
[1] "In SETUP_DATA:"
[1] " names(data):"
[1] "carat" "cut" "color" "clarity" "depth" "table"
[7] "price" "x" "y" "z" "hello"
[1] ""
[1] "In COMPUTE_LAYOUT:"
[1] " names(data):"
[1] "carat" "cut" "color" "clarity" "depth" "table"
[7] "price" "x" "y" "z" "hello"
[1] ""
[1] "In MAP_DATA:"
[1] " names(data):"
[1] "carat" "cut" "color" "clarity" "depth" "table"
[7] "price" "x" "y" "z"
[1] ""
notice how my hello
column is available in compute_layout
but not map_data
As a workaround hack, I CAN create some columns and pass them through as parameters
using setup_params
. This is a bit gross because they're not "parameters" conceptually, they're data. But if all else fails - I will take this approach
library(ggplot2)
facet_bootstrap <- function(n = 9, prop = 0.2, nrow = NULL, ncol = NULL,
scales = "fixed", shrink = TRUE, strip.position = "top") {
facet <- facet_wrap(~.bootstrap, nrow = nrow, ncol = ncol, scales = scales,
shrink = shrink, strip.position = strip.position)
facet$params$n <- n
facet$params$prop <- prop
ggproto(NULL, FacetBootstrap,
shrink = shrink,
params = facet$params
)
}
FacetBootstrap <- ggproto("FacetBootstrap", FacetWrap,
setup_params = function(data, params){
params$hello <- 'world'
print("In SETUP_DATA:")
print(" params$hello:")
print(params$hello)
print("")
params
},
compute_layout = function(data, params) {
id <- seq_len(params$n)
print("In COMPUTE_LAYOUT:")
print(" params$hello:")
print(params$hello)
print("")
dims <- wrap_dims(params$n, params$nrow, params$ncol)
layout <- data.frame(PANEL = factor(id))
if (params$as.table) {
layout$ROW <- 1+as.integer((id - 1L) %/% dims[2] + 1L)
} else {
layout$ROW <- 1+as.integer(dims[1] - (id - 1L) %/% dims[2])
}
layout$COL <- 2+as.integer((id - 1L) %% dims[2] + 1L)
layout <- layout[order(layout$PANEL), , drop = FALSE]
rownames(layout) <- NULL
# Add scale identification
layout$SCALE_X <- if (params$free$x) id else 1L
layout$SCALE_Y <- if (params$free$y) id else 1L
cbind(layout, .bootstrap = id)
},
map_data = function(data, layout, params) {
print("In MAP_DATA:")
print(" params$hello:")
print(params$hello)
print("")
if (is.null(data) || nrow(data) == 0) {
return(cbind(data, PANEL = integer(0)))
}
n_samples <- round(nrow(data) * params$prop)
new_data <- lapply(seq_len(params$n), function(i) {
cbind(data[sample(nrow(data), n_samples), , drop = FALSE], PANEL = i)
})
do.call(rbind, new_data)
}
)
ggplot(diamonds, aes(carat, price)) +
geom_point(alpha = 0.1) +
facet_bootstrap(n = 9, prop = 0.05)
with the following output
[1] "In SETUP_DATA:"
[1] " params$hello:"
[1] "world"
[1] ""
[1] "In COMPUTE_LAYOUT:"
[1] " params$hello:"
[1] "world"
[1] ""
[1] "In MAP_DATA:"
[1] " params$hello:"
[1] "world"
[1] ""
data
ONCE, and then make them available for both map_data
AND compute_layout
for a Facet
ggproto
?Thanks in advance!
TL;DR: set a new column in every list-element of data
in the setup_data
function.
It seems that setup_params and setup_data are meant for this exact use case.
That's right, but I get the impression from your question that some confusion exists about the order of operations of data ingestion. Facets and coordinates are part of the 'layout' of a plot. Before the layout is setup, layers setup their data (sometimes making a copy of the global data). Then, the layout can inspect the data and make adjustments (typically appending a PANEL column). If we inspect/print to console ggplot2:::Layout$setup
, we see the following (comments by me):
<ggproto method>
<Wrapper function>
function (...)
f(..., self = self)
<Inner function (f)>
function (self, data, plot_data = new_data_frame(), plot_env = emptyenv())
{
data <- c(list(plot_data), data)
# First `setup_params` is used
self$facet_params <- self$facet$setup_params(data, self$facet$params)
self$facet_params$plot_env <- plot_env
# Second, `setup_data` is used
data <- self$facet$setup_data(data, self$facet_params)
self$coord_params <- self$coord$setup_params(data)
data <- self$coord$setup_data(data, self$coord_params)
# Third, `compute_layout` is used.
self$layout <- self$facet$compute_layout(data, self$facet_params)
self$layout <- self$coord$setup_layout(self$layout, self$coord_params)
check_layout(self$layout)
# Lastly, `map_data` is used for every data *except* the global data!
lapply(data[-1], self$facet$map_data, layout = self$layout,
params = self$facet_params)
}
So from this we learn that the order of operations is setup_params
--> setup_data
--> compute_layout
--> map_data
. Note that map_data
starts with lapply(data[-1], ...)
wherein data
is a list with data.frames with the global data in position 1 and layer data thereafter.
Your setup_data
method only applies data[[1]]$hello <- 'world'
to the global data and not to layer data. Replacing that line with data <- lapply(data, cbind, hello = "world")
applies it to global data and layer data. At this point, every layer already has it's own (copy of the global) data, so from an efficiency standpoint, there is not a lot the facets can do to efficiently append a column to the global data that layers can 'inherit'.
To be more explicit, this is what I'm proposing:
FacetBootstrap <- ggproto(
"FacetBootstrap", FacetWrap,
setup_data = function(data, params){
data <- lapply(data, cbind, hello = "world")
print("In SETUP_DATA:")
print(" names(data):")
print(names(data[[1]]))
print("")
data
},
...other code...
)