rdplyrpurrrpmap

Passing arguments to pmap in mutate


I have a problem with passing arguments to purrr::pmap when using with mutate. I don't understand why some things work and some don't.

My example data:

sdf <- tibble(
  col_id  = c("id1",  "id2", "id3", "id4", "id5", "id6",  "id7",  "id8", "id9", "id10"),
  col_a  = c(0.7,  0.3, 1.4, 0.7, 0.5, 1.1,  0.1,  0.6, 1.7, 0.5),
  col_b  = c(NA, 0.6, 0.2, 0.2, 0.7, 0.2, 0.7,  3.7, 0.7, 0.7),
  col_c  = c(0.3, 0.4,  1.0,  NA,  3.1,  0.2, 0.4,  1.0, 0.1, 0.5))

params = c("col_a", "col_b", "col_c")

Then I want to execute some functions in rows using pmap_dbl.

First code (below) evaluates as intended.

# code 1
sdf_2 <- sdf %>% 
  select(all_of(params)) %>% 
  mutate(sum_p = pmap_dbl(., sum, na.rm = TRUE))

But the same syntax doesn't work with a different function:

sdf_2 <- sdf %>% 
  select(all_of(params)) %>% 
  mutate(mean_p = pmap_dbl(., mean, na.rm = TRUE))

Error in mutate(., mean_p = pmap_dbl(., mean, na.rm = TRUE)) : Caused by error in mean.default(): ! argument "x" is missing, with no default

Also, when I try to pass parameters to sum function directly - not by ... it does not work

sdf_2 <- sdf %>% 
  select(all_of(params)) %>% 
  mutate(sum_p = pmap_dbl(., sum(na.rm = TRUE)))

Error in mutate(., sum_p = pmap_dbl(., sum(na.rm = TRUE))) : Caused by error in pluck(): ! argument "x" is missing, with no default

What is the correct way to pass parameters to functions inside pmap when working on whole dataframe horizontally?

Next question: Is there any way to pas column names stored in params to perform function in pmap only on them? select(all_of(params)) works but result dataframe has no id column. It's easy to recreate, but would be nice to not remove it at all.


Solution

  • Why can't I parse mean to pmap?

    Try:

    mean(0.7, NA, 0.3, na.rm = TRUE)
    sum(0.7, NA, 0.3, na.rm = TRUE)
    

    mean take argument x,sum takes ... directly (check documentation). You'll need:

    mean(c(0.7, NA, 0.3), na.rm = TRUE)
    

    I.e.

    library(dplyr)
    library(purrr)
    
    sdf |> 
      mutate(mean_p = pmap_dbl(across(params), ~ mean(c(...), na.rm = TRUE)))
    

    Output:

    # A tibble: 10 × 5
       col_id col_a col_b col_c mean_p
       <chr>  <dbl> <dbl> <dbl>  <dbl>
     1 id1      0.7  NA     0.3  0.5  
     2 id2      0.3   0.6   0.4  0.433
     3 id3      1.4   0.2   1    0.867
     4 id4      0.7   0.2  NA    0.45 
     5 id5      0.5   0.7   3.1  1.43 
     6 id6      1.1   0.2   0.2  0.5  
     7 id7      0.1   0.7   0.4  0.4  
     8 id8      0.6   3.7   1    1.77 
     9 id9      1.7   0.7   0.1  0.833
    10 id10     0.5   0.7   0.5  0.567
    

    How to to specify variables in pmap?

    1. With cur_data()
    library(dplyr)
    library(purrr)
    
    sdf |>
      mutate(sum_p = pmap_dbl(select(cur_data(), all_of(params)), sum, na.rm = TRUE))
    
    1. With across
    library(dplyr)
    library(purrr)
    
    sdf |> 
      mutate(sum_p = pmap_dbl(across(params), sum, na.rm = TRUE))
    
    1. Manual list
    library(dplyr)
    library(purrr)
    
    sdf |>
      mutate(sum_p = pmap_dbl(list(col_a, col_b, col_c), sum, na.rm = TRUE))
    
    1. With unquote-splicing:
    library(dplyr)
    library(purrr)
    library(rlang)
    
    sdf |>
      mutate(sum_p = pmap_dbl(list(!!!syms(params)), sum, na.rm = TRUE))
    

    Output:

    # A tibble: 10 × 5
       col_id col_a col_b col_c sum_p
       <chr>  <dbl> <dbl> <dbl> <dbl>
     1 id1      0.7  NA     0.3   1  
     2 id2      0.3   0.6   0.4   1.3
     3 id3      1.4   0.2   1     2.6
     4 id4      0.7   0.2  NA     0.9
     5 id5      0.5   0.7   3.1   4.3
     6 id6      1.1   0.2   0.2   1.5
     7 id7      0.1   0.7   0.4   1.2
     8 id8      0.6   3.7   1     5.3
     9 id9      1.7   0.7   0.1   2.5
    10 id10     0.5   0.7   0.5   1.7
    

    The fast way: Using rowMeans and rowSums with across:

    library(dplyr)
    
    sdf |> mutate(mean_p = rowMeans(across(params), na.rm = TRUE))
    sdf |> mutate(sum_p = rowSums(across(params), na.rm = TRUE))
    

    Update: Add fourth way