rdplyrtidyversenon-standard-evaluation

Pass a named list or character value to dplyr’s mutate to create multiple columns in a dataframe


My goal is to create multiple dataframe columns as in the following example.

# Goal
mtcars %>% 
  rownames_to_column("model") %>% 
  mutate(MAZDA = case_when(grepl('^Mazda', model) ~ 1, TRUE ~ 0),
         MERC  = case_when(grepl('^Merc',  model) ~ 1, TRUE ~ 0),
         VOLVO = case_when(grepl('^Volvo', model) ~ 1, TRUE ~ 0) )

However, in my real-world application, I cannot be sure, how many columns are to be created. The number of case-when conditions may vary. This is why I want to either create a named list or a character value that contains the conditions based on the input data. However, one piece is missing: How can I pass a named list or a character value to dplyr’s mutate verb? Both of the following examples do not work.

# Named list
condition1 <- list("case_when(grepl('^Mazda', model) ~ 1, TRUE ~ 0)",
                   "case_when(grepl('^Merc',  model) ~ 1, TRUE ~ 0)",
                   "case_when(grepl('^Volvo', model) ~ 1, TRUE ~ 0)")

names(condition1) <- c("MAZDA", "MERC", "VOLVO")

result1 <- mtcars %>% mutate(!!!condition1)

# Character string
condition2 <- 
"MAZDA = case_when(grepl('^Mazda', model) ~ 1, TRUE ~ 0),
 MERC  = case_when(grepl('^Merc',  model) ~ 1, TRUE ~ 0),
 VOLVO = case_when(grepl('^Volvo', model) ~ 1, TRUE ~ 0)"

result2 <- mtcars %>% mutate(eval(parse(text = condition2)))

How can I make them work? Is there an alternative base R approach to solve the problem?


Solution

  • 1) Assuming that the conditions are of the form shown we can simplify the input to just the Names and then create the columns using map_dfr and bind that to the original data frame.

    library(dplyr)
    library(purrr)
    
    Names <- c("Mazda", "Merc", "Volvo")  # input names
    
    mtcars %>% 
      bind_cols(
        rownames(.) %>%
        { map_dfr(set_names(Names, toupper), \(x) +startsWith(., x)) }
      )
    

    2) If the forms can differ creat4 a list of functions as shown and then apply them:

    L2 <- list(MAZDA = \(x) +startsWith(x, "Mazda"), 
               MERC = \(x) +startsWith(x, "Merc"), 
               VOLVO = \(x) +startsWith(x, "Volvo"))
    
    mtcars %>%
      bind_cols(rownames(.) %>% { map_dfr(L2, \(f) f(.)) })
     
    

    3) or if for some reason a character vector is needed then try the following (although to me (1) or (2) is preferable):

    L3 <- list(MAZDA = '\\(x) +startsWith(x, "Mazda")', 
               MERC = '\\(x) +startsWith(x, "Merc")', 
               VOLVO = '\\(x) +startsWith(x, "Volvo")')
    
    mtcars %>%
      bind_cols(rownames(.) %>% { map_dfr(L3, \(f) eval(str2lang(f))(.)) })