rfunctionacrossmutated

How do you apply a function that has multiple inputs across multiple columns?


I have a function that takes in two arguments (columns). The function changes the value of one column (x) based on what the value in another column is (y).

fun <- function(x, y) {
  x = coalesce(case_when(y %in% c("hello", "hi") ~ '1',
            y == "thanks" ~ '2'), x)
}

However, this needs to be done over many column pairs, why I want to make it a function.

Is this the right way of doing it:

df %>% mutate(across(c(col1, col3), c(col2, col4), fun))

from

col1     col2     col3     col4
1                 1
2                 4
5       "hello"   5        "hello"
3               
4                 4
5       "hi"      5        "thanks"
5       "thanks" 
5       "goodbye" 5        "hello"

To

col1     col2     col3     col4
1                 1
2                 4
1       "hello"   1        "hello"
3               
4                 4
1       "hi"      2        "thanks"
2       "thanks" 
5       "goodbye" 1        "hello"

Solution

  • If it is pairwise, then we may need map2 which returns a list of vectors which can be assigned to new columns to the existing column of dataset (not clear from the code)

    library(purrr)
    library(dplyr)
    fun <- function(data, x, y) {
         coalesce(case_when(data[[y]] %in% c("hello", "hi") ~ 1,
               data[[y]] == "thanks" ~ 2), data[[x]])
    }
    df[c("col1", "col3")] <- map2( c("col1", "col3"),
         c("col2", "col4"), ~  fun(df, .x, .y))
    

    -output

    > df
      col1    col2 col3   col4
    1    1    <NA>    1   <NA>
    2    2    <NA>    4   <NA>
    3    1   hello    1  hello
    4    3    <NA>   NA   <NA>
    5    4    <NA>    4   <NA>
    6    1      hi    2 thanks
    7    2  thanks   NA   <NA>
    8    5 goodbye    1  hello
    

    data

    df <- structure(list(col1 = c(1L, 2L, 5L, 3L, 4L, 5L, 5L, 5L), col2 = c(NA, 
    NA, "hello", NA, NA, "hi", "thanks", "goodbye"), col3 = c(1L, 
    4L, 5L, NA, 4L, 5L, NA, 5L), col4 = c(NA, NA, "hello", NA, NA, 
    "thanks", NA, "hello")), class = "data.frame", row.names = c(NA, 
    -8L))