Map statements are just awesome, but their inputs often feel needlessly redundant. For example, in the block below, it seems silly to have to re-list the variables I've already requested from the tibble
. In short, I wish I could just put \()
for the anonymous functions. Anyone have a way to cut down the redundancy? Of course, I could do this task below with mutate
, but I'm just trying to demonstrate what I'm looking for.
library(tidyverse)
mtcars |>
as_tibble() |>
mutate(
test = pmap_chr(
.l = list(mpg,cyl),
.f = \(mpg,cyl){ #Ahhhh! So redundant!!!
str_glue("{mpg}_{cyl}")
}
)
)
PS: "~" is no longer best practice, even though that helps me.
PPS: This is not an official part of the question since it is opinion-based, but do you have a preference for using map in mutate
vs on a list directly? I've found using them with mutate
to be more organized and require less setup, but I've always wondered if this was a good idea. I don't see it as often as people just setting up lists and feeding them into purrr::map
As pointed out in the comments we do not actually need pmap_*
here but let us say for purposes of answering the question that we must use it.
Then we can use the formula representing a function ~ with(list(...) ..whatever..)
to avoid repeating the arguments while allowing us to refer to the arguments by name. In this code we have used dot (.) for brevity but if we were using |> or if the mutate
were in a group_by
use the more verbose pick(everything())
instead.
library(dplyr)
library(purrr)
library(stringr)
mtcars[1:3, 1:3] %>%
mutate(test = pmap(., ~ with(list(...), str_glue("{mpg}_{cyl}"))))
## mpg cyl disp test
## Mazda RX4 21.0 6 160 21_6
## Mazda RX4 Wag 21.0 6 160 21_6
## Datsun 710 22.8 4 108 22.8_4
If it were ok to use str_glue_data
we could simplify it slightly.
mtcars[1:3, 1:3] %>%
mutate(test = pmap(., ~ str_glue_data(list(...), "{mpg}_{cyl}")))
Here is a different example to show that with/list
works in other situations too:
dat <- data.frame(a = 11:13, b = 21:23, c = 1:3)
dat %>%
mutate(p.value = pmap_dbl(., ~ with(list(...), prop.test(a, b)$p.value)))
## a b c p.value
## 1 11 21 1 1.0000000
## 2 12 22 2 0.8311704
## 3 13 23 3 0.6766573
We define a short utility function which generates the auxiliary function to be used in pmap_*
auxfun <- function(e) {
.s <- substitute(e)
function(...) eval(.s, list(...))
}
mtcars[1:3, 1:3] %>%
mutate(test = pmap_chr(., auxfun( str_glue("{mpg}_{cyl}") ) ) )
It should be pointed out that simply using the mutate
argument .by=
is simpler and more compact.
For the .by=
argument we can use any unique column or set of columns . If there are no duplicate rows we can use everything()
. If there might be duplicate rows create a column of unique values first using mutate(row = row_number())
and then use .by = row
.
In the first example below we don't actually need .by=
but, as discussed at the beginning, to address the question which assumes the need for evaluating an expression separately for each row we include it.
mtcars[1:3, 1:3] %>%
mutate(test = str_glue("{mpg}_{cyl}"), .by = everything())
dat %>%
mutate(p.value = prop.test(a, b)$p.value, .by = c)