rpurrr

How to cut down on argument redundancy in purrr::map


Map statements are just awesome, but their inputs often feel needlessly redundant. For example, in the block below, it seems silly to have to re-list the variables I've already requested from the tibble. In short, I wish I could just put \() for the anonymous functions. Anyone have a way to cut down the redundancy? Of course, I could do this task below with mutate, but I'm just trying to demonstrate what I'm looking for.

library(tidyverse)

mtcars |> 
  as_tibble() |> 
  mutate(
    test = pmap_chr(
      .l = list(mpg,cyl),
      .f = \(mpg,cyl){ #Ahhhh! So redundant!!!
        str_glue("{mpg}_{cyl}")
      }
    )
  )

PS: "~" is no longer best practice, even though that helps me.

PPS: This is not an official part of the question since it is opinion-based, but do you have a preference for using map in mutate vs on a list directly? I've found using them with mutate to be more organized and require less setup, but I've always wondered if this was a good idea. I don't see it as often as people just setting up lists and feeding them into purrr::map


Solution

  • As pointed out in the comments we do not actually need pmap_* here but let us say for purposes of answering the question that we must use it.

    Then we can use the formula representing a function ~ with(list(...) ..whatever..) to avoid repeating the arguments while allowing us to refer to the arguments by name. In this code we have used dot (.) for brevity but if we were using |> or if the mutate were in a group_by use the more verbose pick(everything()) instead.

    library(dplyr)
    library(purrr)
    library(stringr)
    
    mtcars[1:3, 1:3] %>%
      mutate(test = pmap(., ~ with(list(...), str_glue("{mpg}_{cyl}"))))
    ##                mpg cyl disp   test
    ## Mazda RX4     21.0   6  160   21_6
    ## Mazda RX4 Wag 21.0   6  160   21_6
    ## Datsun 710    22.8   4  108 22.8_4
    

    If it were ok to use str_glue_data we could simplify it slightly.

    mtcars[1:3, 1:3] %>%
      mutate(test = pmap(., ~ str_glue_data(list(...), "{mpg}_{cyl}")))
    

    Here is a different example to show that with/list works in other situations too:

    dat <- data.frame(a = 11:13, b = 21:23, c = 1:3)
    dat %>%
      mutate(p.value = pmap_dbl(., ~ with(list(...), prop.test(a, b)$p.value)))
    ##    a  b c   p.value
    ## 1 11 21 1 1.0000000
    ## 2 12 22 2 0.8311704
    ## 3 13 23 3 0.6766573
    

    auxfun

    We define a short utility function which generates the auxiliary function to be used in pmap_*

    auxfun <- function(e) {
      .s <- substitute(e)
      function(...) eval(.s, list(...))
    }
    
    mtcars[1:3, 1:3] %>%
      mutate(test = pmap_chr(., auxfun( str_glue("{mpg}_{cyl}") ) ) )
    

    .by

    It should be pointed out that simply using the mutate argument .by= is simpler and more compact.

    For the .by= argument we can use any unique column or set of columns . If there are no duplicate rows we can use everything(). If there might be duplicate rows create a column of unique values first using mutate(row = row_number()) and then use .by = row .

    In the first example below we don't actually need .by= but, as discussed at the beginning, to address the question which assumes the need for evaluating an expression separately for each row we include it.

    mtcars[1:3, 1:3] %>%
      mutate(test = str_glue("{mpg}_{cyl}"), .by = everything())
    
    dat %>%
      mutate(p.value = prop.test(a, b)$p.value, .by = c)