I have data with multiple observable variables (say, x
,y
,z
), indexed by another set of variables (sect
, and item
). Each time I run an experiment I get such a set of observations. So for experiment "A" I get a value for each variable x
,y
,z
for each value of the index pair (sect
, item
). Then I run another experiment "B", and get a whole new set of these variables.
What I want to do is simple: plot the observed values in one experiment against their respective values in another experiment, faceted by variable (so, plot x
from A against x
from B, and likewise for y
, and z
). I would like to do this in a "tidy" way, but the only ways I can find seem more complicated than it should be.
Here's some simulated data to illustrate with:
library(tidyr)
library(dplyr)
library(ggplot2)
# Function to simulate an experiment
simdata <- function(experiment_name) {
n <- 3 # number of sections
m <- 7 # number of items per section
tibble(
# data points (section-item pairs)
sect = factor(rep(1:n, ea = m)), item = factor(rep(1:m, n)),
# simulated observed values of three variables
x = (1:(n * m))^1.05 + rnorm(n * m),
y = (1:(n * m))^1.15 + rnorm(n * m, sd = 2),
z = (1:(n * m))^1.25 + rnorm(n * m, sd = 4),
experiment = experiment_name
)
}
# Make an example dataset consisting of
# data from experiments named "A", "B", and "C"
set.seed(42)
d <- bind_rows(simdata("A"), simdata("B"), simdata("C"))
So, d
is a dataset with data from three experiments. Here's the first few rows.
r$> d
# A tibble: 63 × 6
sect item x y z experiment
<fct> <fct> <dbl> <dbl> <dbl> <chr>
1 1 1 2.37 -2.56 4.03 A
2 1 2 1.51 1.88 -0.528 A
3 1 3 3.53 5.97 -1.52 A
4 1 4 4.92 8.71 7.39 A
5 1 5 5.82 5.50 4.23 A
# … with 58 more rows
Now, say I want to plot the observations from experiment A against those from experiment B. I'll call these control and alternative:
# a list of two experiment names, to compare
exps <- list(control = "A", alternative = "B")
Now here's the part that seems overcomplicated. The best way I can find of doing what I want to do involves two pivots (which seems ugly). This results in columns for each experiment. And then I wrap the experiment names (with sym()
) and immediately unwrap (with !!
) in order to refer to these columns by name, as seems necessary for tidy evaluation afaiu.
This works, but is there a better way of doing this?
d_reshaped <- d |>
## There must be a better way of doing this reshaping
pivot_longer(
cols = -c(experiment, sect, item),
names_to = "var", values_to = "value"
) |>
pivot_wider(names_from = c("experiment"), values_from = "value")
d_reshaped |>
## But I'm mostly looking for a better way to do this dereferencing...
ggplot(aes(
!!sym(exps$control),
!!sym(exps$alternative)
)) +
geom_point(alpha = 0.5) +
facet_grid(~var) +
coord_fixed() +
labs(title = paste("Experiment", exps, collapse = " vs "))
I can see that instead of the wrapping/unwrapping !!sym
part I could use aes_string(exps$control, exps$alternative)
but that is soft deprecated, so I get the warning
Warning message:
`aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation ideoms with `aes()`
and so I suppose I shouldn't use it. Anyway, the main thing I wonder is whether there's a better way of doing the whole thing, since I think I must be overcomplicating this, but can't see how.
I think the way you are doing things is reasonable. It is a moderately complex data wrangling task to go from your existing data layout to the layout you need to plot.
However, you have only gone halfway in getting the data into the correct format, and that leads to you needing to specify pairs of experiments in an external variable and using the !!sym(var)
syntax. Although it takes a bit of effort, I think it is worth wrangling your data into the perfect plotting format:
plot_df <- combn(unique(d$experiment), 2) |>
apply(2, \(v) filter(d, experiment %in% v)) |>
lapply(\(x) split(x, x$experiment)) |>
lapply(\(x) cbind(
x[[1]] |> rename_with(~ paste0(.x, 1)),
x[[2]] |> rename_with(~ paste0(.x, 2))
)) |>
bind_rows() |>
mutate(pair_experiments = paste(experiment1, experiment2, sep = " vs ")) |>
select(!matches("^(sect|item|experiment)")) |>
pivot_longer(-pair_experiments,
names_pattern = "(.)(\\d)",
names_to = c("var", ".value")
) |>
rename(xvar = `1`, yvar = `2`)
plot_df
#> # A tibble: 189 x 4
#> pair_experiments var xvar yvar
#> <chr> <chr> <dbl> <dbl>
#> 1 A vs B x 2.37 2.40
#> 2 A vs B y -2.56 -1.39
#> 3 A vs B z 4.03 1.42
#> 4 A vs B x 1.51 1.34
#> 5 A vs B y 1.88 3.44
#> 6 A vs B z -0.528 0.689
#> 7 A vs B x 3.53 4.47
#> 8 A vs B y 5.97 3.10
#> 9 A vs B z -1.52 3.46
#> 10 A vs B x 4.92 4.62
#> # i 179 more rows
#> # i Use `print(n = ...)` to see more rows
So now you can get all your combinations in a single faceted plot:
ggplot(plot_df, aes(xvar, yvar)) +
geom_point(alpha = 0.5) +
facet_grid(pair_experiments ~ var, switch = "y") +
coord_fixed() +
labs(x = NULL, y = NULL)
And even if you don't want all pairs in one plot, it's trivial to filter to plot any pair you want:
ggplot(plot_df %>% filter(pair_experiments == "A vs B"), aes(xvar, yvar)) +
geom_point(alpha = 0.5) +
facet_grid(. ~ var) +
coord_fixed() +
labs(x = "A", y = "B")