rtidyversepurrrlmertest

entering column arguments from list dataframes in a custom function using purrr::map


I am writing a custom function that does linear mixed-effects model for each element of a list with the help of purrr::map. The code block works perfectly fine, but when I turn it into a custom function, it's not clear how I should enter the arguments that correspond to individual columns from list elements.

If I get the custom function working, I can use it for as many as variables as I want. Otherwise, I'll have to keep copy-pasting the same code for different variables.

# libraries needed
library(purrr)
library(lmerTest)
data(mtcars)

# create a list of dataframes from mtcars based on a split
group_list <- split(mtcars, mtcars$am)

# goal: to do linear mixed effects model for each dataframe and combining the results neatly in a dataframe

# achieving this outside of a custom function
group_list %>%
  purrr::map(.x = (.),
             .f = ~ lmerTest::lmer(
               scale(mpg) ~ scale(wt) + (wt | cyl),
               data = (.),
               REML = FALSE
             )) %>%
  purrr::map(.f = ~ coef(summary(.))[-c(1),]) %>%
  base::do.call(what = cbind.data.frame, args = .) %>%
  tibble::rownames_to_column(df = ., var = "Effect")
#>       Effect          0             1
#> 1   Estimate -0.3318711 -9.089148e-01
#> 2 Std. Error  0.2104268  1.156500e-01
#> 3         df  0.6084658  1.300000e+01
#> 4    t value -1.5771334 -7.859187e+00
#> 5   Pr(>|t|)  0.4558206  2.714599e-06

# preparing the custom function to do the same
lmer_group <- function(list, x, y) {
  list %>%
    purrr::map(
      .x = (.),
      .f = ~ lmerTest::lmer(
        scale(y) ~ scale(x) + (x | cyl),
        data = (.),
        REML = FALSE
      )
    ) %>%
    purrr::map(.f = ~ coef(summary(.))[-c(1),]) %>%
    base::do.call(what = cbind.data.frame, args = .) %>%
    tibble::rownames_to_column(df = ., var = "Effect")
}

# doing the same analysis with a custom function
lmer_group(list = group_list, x = wt, y = mpg) # attempt 1
#> Error in scale(y): object 'mpg' not found
lmer_group(list = group_list, x = 'wt', y = 'mpg') # attempt 2
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
lmer_group(
  list = group_list,
  x = lapply(group_list, `[`, 'wt'),
  y = lapply(group_list, `[`, 'mpg')
) # attempt 3
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric

Created on 2018-01-28 by the reprex package (v0.1.1.9000).


Solution

  • All the indirection occurs within the formula, so now I don't think rlang is needed at all.

    You can pass the strings of the desired variables, and paste them together as a string of the lmer function. Then use stats::as.formula() to convert it to a proper formula for lmer's sake.

    lmer_group <- function(l, x_name, y_name) {
      fx <- paste0("scale(", y_name, ") ~ scale(", x_name, ") + (", x_name," | cyl)")
      print(paste("Evaluating: ", fx))
    
      l %>% 
        purrr::map(
          .f = ~ lmerTest::lmer(
            as.formula(fx),
            data = (.),
            REML = FALSE
          )
        ) %>%
        purrr::map(.f = ~ coef(summary(.))[-c(1),]) %>%
        base::do.call(what = cbind.data.frame, args = .) %>%
        tibble::rownames_to_column(df = ., var = "Effect")
    }
    
    lmer_group(l = group_list, x = 'wt', y = 'mpg') # attempt 2
    

    results:

    [1] "Evaluating:  scale(mpg) ~ scale(wt) + (wt | cyl)"
          Effect          0             1
    1   Estimate -0.3318712 -9.089148e-01
    2 Std. Error  0.2104267  1.156500e-01
    3         df  0.6084632  1.300000e+01
    4    t value -1.5771343 -7.859187e+00
    5   Pr(>|t|)  0.4558213  2.714599e-06
    

    I bet there's an rlang approach with quo(). If you take this solution, it's essentially a duplicate of Formula with dynamic number of variables.