I've got a large dataset and want to apply a custom function over each of the columns.
I've written the function and it works when applied as a one-off on one column of the target dataset.
However, when I try to use purrr::map
, then the custom function throws an error about halfway through the function.
There's a select
statement which references the column I want to use and this throws an error saying can't reference columns that don't exist; Column `var1` doesn't exist.
I've put a reproducible example below. The actual data is 1,000s of columns wide.
Ultimately, I want to get an output where the relative risk of the outcome variable is listed against each column in the dataset:
column_name | rr_0 | rr_1 |
---|---|---|
var1 | 1 | 1.17 |
var2 | 1 | 1.03 |
var3 | ... | ... |
library(dplyr)
library(purrr)
library(tidyr)
set.seed(1)
# sample dataset
test_dat <- data.frame(var1 = rbinom(n = 10, size = 1, prob = 0.3),
var2 = rbinom(n = 10, size = 1, prob = 0.1),
var3 = rbinom(n = 10, size = 1, prob = 0.4),
outcome = rbinom(n = 10, size = 1, prob = 0.3))
test_dat
# get names of columns to iterate
over_vec <- names(test_dat)
over_vec <- over_vec[!(over_vec %in% c("outcome"))]
over_vec
# function I want to use
test_fun <- function(code, dataset){
dataset <- dataset %>%
group_by({{code}}) %>%
summarise(n = n(), n_out = sum(outcome)) %>%
mutate(risk = n_out/n * 100,
rr = risk / risk[row_number() == 1]) %>%
dplyr::select({{code}}, rr) %>%
pivot_wider(names_from = {{code}}, values_from = rr)
return(dataset)
}
# works for one column
test_fun(code = var1, dataset = test_dat)
# fails when iterated with purrr::map
output <- over_vec %>%
map(.x = ., .f = test_fun, dataset = test_dat)
output
You are passing the column names as a vector of strings, but your function is written to accept a bare symbol as the first argument - note that you are calling it successfully with code = var1
, not as code = "var1"
.
You could just convert over_vec
to a list of symbols:
over_vec %>%
lapply(as.symbol) %>%
map(.x = ., .f = test_fun, dataset = test_dat)
#> [[1]]
#> # A tibble: 1 x 2
#> `0` `1`
#> <dbl> <dbl>
#> 1 1 1.17
#>
#> [[2]]
#> # A tibble: 1 x 2
#> `0` `1`
#> <dbl> <dbl>
#> 1 1 0
#>
#> [[3]]
#> # A tibble: 1 x 2
#> `0` `1`
#> <dbl> <dbl>
#> 1 1 1.17