I'm trying to create histograms per-group then return a summary. Per this answer, I can use {braces} and print
to avoid issues in creating one plot then moving onto another, however this doesn't seem to acknowledge grouping:
data(mtcars)
mtcars |>
group_by(cyl) %T>%
{print(ggplot(.) +
geom_histogram(aes(x = carb)))} |>
summarise(meancarb = mean(carb))
The above code works insofar as it creates a single histogram then the summary, however:
mtcars %T>%
{print(ggplot(.) +
geom_histogram(aes(x = carb)))} |>
group_by(cyl) |>
summarise(meancarb = mean(carb))
The above code produces exactly the same output, i.e. confirming that group_by
isn't being acknowledged.
Does anyone know why the grouping isn't being used to create 1 histogram per unique cyl
? Ideally I'd love to work out how to use Tee pipes to do this kinda thing more often, including saving the output to unique names, before continuing onto more pipe. In general it feels like Tee pipes are underused, possibly relating to the dearth of info about them, so if anyone has any cool examples to share, that might be great for the community.
Thanks!
Following divibisan's comment about dplyr::group_map
(or group_walk
):
mtcars |>
group_by(cyl) %T>%
group_walk(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb))) |>
summarise(meancarb = mean(carb, na.rm = TRUE),
sd3 = sd(carb, na.rm = TRUE) * 3)
This creates the summary table but no plot(s). Output identical for map
and walk
. Output also the same if I replace %T>%
with |>
. Ostensibly group_walk
is doing the same as %T>%
. With |>
and group_map
, I get:
Error in UseMethod("summarise"): no applicable method for 'summarise' applied to an object of class "list"
mtcars |>
group_by(cyl) %T>%
{print(group_walk(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb))))} |>
summarise(meancarb = mean(carb, na.rm = TRUE),
sd3 = sd(carb, na.rm = TRUE) * 3)
With print and braces:
Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'print': argument ".data" is missing, with no default
Braces no print:
Error in group_map(.data, .f, ..., .keep = .keep): argument ".data" is missing, with no default
Print no braces: same as braces no print.
More interesting ideas coming forth, thanks to Ricardo, this:
mtcars |>
group_split(cyl) |>
map(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb)))
Works insofar as it produces 1 plot per group. Success! But: I can't find any combination of Tee/pipes which Tees off mtcars
for the group_split
AND map
, and then resumes the main pipe line:
mtcars %T>%
group_split(cyl) %T>%
map(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb))) |>
summarise(meancarb = mean(carb))
Error in
map()
: In index: 1. With name: mpg. Caused by error infortify()
:data
must be a <data.frame>, or an object coercible byfortify()
, not a double vector.
Also anything other than 2 pipes means the plots aren't created.
Trying this another way around, by reordering the pipe structure (which won't always be possible/desirable):
mtcars |>
group_by(cyl) %T>%
summarise(meancarb = mean(carb)) |>
ungroup() |>
group_split(cyl) |>
map(.f = ~ ggplot(.) +
geom_histogram(aes(x = carb)))
This creates the 3 plots but doesn't print the summary. Any combination of {braces} and/or print
around the summary line gives:
Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'mean': object 'carb' not found.
Does anyone know whether the Tee pipe is explicitly for a single command, i.e. you can't pipe another command onto the tee branch, and then return to the main pipe? Thanks all
Thanks zephyr. Followup question: how to do multi-command tee pipes without a formula-format first command?
mtcars |>
summarise(sdd = sd(carb, na.rm = TRUE))
Works fine, prints a single value.
mtcars %T>%
summarise(sdd = sd(carb, na.rm = TRUE)) |>
summarise(
meancarb = mean(carb, na.rm = TRUE),
sd3 = sd(carb, na.rm = TRUE) * 3
)
Doesn't print the value, performs the calculation invisibly then continues. Any combination of print
and {braces}
I've tried results in:
Error: function '{' not supported in RHS call of a pipe
or
Error in is.data.frame(x) : object 'carb' not found
Say I wanted, e.g.:
mtcars |>
summarise(~{
print(sdd = sd(carb))
write_csv(file = "tmp.csv")
.x
}) |>
summarise(meancarb = mean(carb))
Any thoughts? Thanks again!
You were on the right track with group_walk()
, but you need to put the print()
inside the mapped function:
library(dplyr)
library(purrr)
library(magrittr)
library(ggplot2)
mtcars |>
group_by(cyl) %T>%
group_walk(~ print(
ggplot(.) + geom_histogram(aes(x = carb))
)) |>
summarise(
meancarb = mean(carb, na.rm = TRUE),
sd3 = sd(carb, na.rm = TRUE) * 3
)
# A tibble: 3 × 3
cyl meancarb sd3
<dbl> <dbl> <dbl>
1 4 1.55 1.57
2 6 3.43 5.44
3 8 3.5 4.
Note you can get the same result without using %T>%
by assigning the plot to a name in your anonymous function and returning the original dataframe after printing:
mtcars |>
group_by(cyl) |>
group_walk(~ {
p <- ggplot(.x) + geom_histogram(aes(x = carb))
print(p)
.x
}) |>
summarise(
meancarb = mean(carb, na.rm = TRUE),
sd3 = sd(carb, na.rm = TRUE) * 3
)