I have a problem with passing arguments to purrr::pmap
when using with mutate
I don't understand why some things work and some don't.
My example data:
sdf <- tibble(
col_id = c("id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8", "id9", "id10"),
col_a = c(0.7, 0.3, 1.4, 0.7, 0.5, 1.1, 0.1, 0.6, 1.7, 0.5),
col_b = c(NA, 0.6, 0.2, 0.2, 0.7, 0.2, 0.7, 3.7, 0.7, 0.7),
col_c = c(0.3, 0.4, 1.0, NA, 3.1, 0.2, 0.4, 1.0, 0.1, 0.5))
params = c("col_a", "col_b", "col_c")
Then I want to execute some functions in rows using pmap_dbl
First code (below) evaluates as intended.
# code 1
sdf_2 <- sdf %>%
select(all_of(params)) %>%
mutate(sum_p = pmap_dbl(., sum, na.rm = TRUE))
But the same syntax doesn't work with a different function:
sdf_2 <- sdf %>%
select(all_of(params)) %>%
mutate(mean_p = pmap_dbl(., mean, na.rm = TRUE))
Error in mutate(., mean_p = pmap_dbl(., mean, na.rm = TRUE)) : Caused by error in
: ! argument "x" is missing, with no default
Also, when I try to pass parameters to sum function directly - not by ... it does not work
sdf_2 <- sdf %>%
select(all_of(params)) %>%
mutate(sum_p = pmap_dbl(., sum(na.rm = TRUE)))
Error in mutate(., sum_p = pmap_dbl(., sum(na.rm = TRUE))) : Caused by error in
: ! argument "x" is missing, with no default
What is the correct way to pass parameters to functions inside pmap when working on whole dataframe horizontally?
Next question:
Is there any way to pas column names stored in params to perform function in pmap only on them?
works but result dataframe has no id column. It's easy to recreate, but would be nice to not remove it at all.
Why can't I parse mean
to pmap
mean(0.7, NA, 0.3, na.rm = TRUE)
sum(0.7, NA, 0.3, na.rm = TRUE)
take argument x
takes ...
directly (check documentation). You'll need:
mean(c(0.7, NA, 0.3), na.rm = TRUE)
sdf |>
mutate(mean_p = pmap_dbl(across(params), ~ mean(c(...), na.rm = TRUE)))
# A tibble: 10 × 5
col_id col_a col_b col_c mean_p
<chr> <dbl> <dbl> <dbl> <dbl>
1 id1 0.7 NA 0.3 0.5
2 id2 0.3 0.6 0.4 0.433
3 id3 1.4 0.2 1 0.867
4 id4 0.7 0.2 NA 0.45
5 id5 0.5 0.7 3.1 1.43
6 id6 1.1 0.2 0.2 0.5
7 id7 0.1 0.7 0.4 0.4
8 id8 0.6 3.7 1 1.77
9 id9 1.7 0.7 0.1 0.833
10 id10 0.5 0.7 0.5 0.567
How to to specify variables in pmap
sdf |>
mutate(sum_p = pmap_dbl(select(cur_data(), all_of(params)), sum, na.rm = TRUE))
sdf |>
mutate(sum_p = pmap_dbl(across(params), sum, na.rm = TRUE))
sdf |>
mutate(sum_p = pmap_dbl(list(col_a, col_b, col_c), sum, na.rm = TRUE))
sdf |>
mutate(sum_p = pmap_dbl(list(!!!syms(params)), sum, na.rm = TRUE))
# A tibble: 10 × 5
col_id col_a col_b col_c sum_p
<chr> <dbl> <dbl> <dbl> <dbl>
1 id1 0.7 NA 0.3 1
2 id2 0.3 0.6 0.4 1.3
3 id3 1.4 0.2 1 2.6
4 id4 0.7 0.2 NA 0.9
5 id5 0.5 0.7 3.1 4.3
6 id6 1.1 0.2 0.2 1.5
7 id7 0.1 0.7 0.4 1.2
8 id8 0.6 3.7 1 5.3
9 id9 1.7 0.7 0.1 2.5
10 id10 0.5 0.7 0.5 1.7
The fast way: Using rowMeans and rowSums with across
sdf |> mutate(mean_p = rowMeans(across(params), na.rm = TRUE))
sdf |> mutate(sum_p = rowSums(across(params), na.rm = TRUE))
Update: Add fourth way