I need to apply a custom function to a set of columns in a dataset and return a list. I can do this in lapply() but I am trying to work with purrr.
Toy data. Three factor variables
df <- data.frame(variableA = factor(sample(x = c("lessThan", "moreThan"),
size = 20,
replace = T,
prob = c(0.5, 0.5))),
variableB = factor(sample(x = c("lessThan", "moreThan"),
size = 20,
replace = T,
prob = c(0.2, 0.8))),
variableC = factor(sample(x = c("lessThan", "moreThan"),
size = 20,
replace = T,
prob = c(0.4, 0.6))))
Now we create the function, one that returns a dataframe breaking down the proportions of each level of the outcome variable, which we pass into the function as a string.
countMoreLessFunct <- function(data, var) {
data %>%
group_by(.data[[var]]) %>%
summarise(count = n()) %>%
ungroup %>%
mutate(tot = sum(count),
perc = round(x = count/tot*100,
digits = 2))
}
The function works fine with a single variable.
countMoreLessFunct(data = df,
var = "variableA")
# output
# variableA count tot perc
# <fct> <int> <int> <dbl>
# 1 lessThan 11 20 55
# 2 moreThan 9 20 45
It also works with lapply()
lapply(names(df), function(i) countMoreLessFunct(df, i))
But when I try it in purrr I get all sorts of errors
df %>%
map(.f = ~countMoreLessFunct(df, .x))
The above, for example, returns the error
# Error in `map()`:
# ℹ In index: 1.
# ℹ With name: variableA.
# Caused by error in `group_by()`:
# ℹ In argument: `.data[[structure(c(1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, `.
# Caused by error in `.data[[<fct>]]`:
# ! Must subset the data pronoun with a string, not a <factor> object.
I am lost. The problem obviously lies in the original function, the fact that it requires a string maybe? Any help appreciated
In your example, map
iterates over the column contents (factors) rather than the column names your function expects. You can use {purrr}'s imap
instead, which provides the index/name under .y
:
df %>%
imap(.f = ~countMoreLessFunct(df, .y))
Since you're into using {purrr}, you could also create your dataframe by mapping a list of "less-than" probabilities:
df <- list(.5, .2, .4) |>
map_dfc( ~ sample(x = c("lessThan", "moreThan"),
size = 20, replace = TRUE,
prob = c(.x, 1 - .x)
) |> factor()
) |>
setNames(nm = paste0('variable', LETTERS[1:3]))