rdataframedplyrmutateacross

Mutate multiple variables using across, starts_with and ifelse statement


I have the following dataframe.

library(dplyr)
data_test = data.frame(va_te=c("yes", "", "no", "yes"),
           va_ti=c("no", "", "yes", "no"),
           va_ze=c("", "no", "yes", "no"),
           jk_te=c(NA, 545, 876, 987),
           jk_ti=c(876, 567, 908, 432),
           jk_ze=c(987, 988, NA, 234),
           loc=c(345, 898, 444, 321))

> data_test
  va_te va_ti va_ze jk_te jk_ti jk_ze loc
1   yes    no          NA   876   987 345
2                no   545   567   988 898
3    no   yes   yes   876   908    NA 444
4   yes    no    no   987   432   234 321

I would like to mutate the first 6 variables using two patterns (one for each group of 3 similar variables: va_ and jk_).

For the first group "va_" I used across + starts_with + ifelse.

For the second group "jk_" I would like to use the same method but as my ifelse uses both jk and va in the test, I don't know how to write this part.

data_test_2 = data_test %>%
  mutate(across(starts_with("va_"), 
                            ~ ifelse(.x == "", NA, .x), .names = "{col}")) %>%
  mutate(jk_te = ifelse(va_te != "yes", NA, 
                 ifelse(va_te == "yes" & is.na(jk_te), loc, jk_te))) %>%
  mutate(jk_ti = ifelse(va_ti != "yes", NA, 
                 ifelse(va_ti == "yes" & is.na(jk_ti), loc, jk_ti))) %>%
  mutate(jk_ze = ifelse(va_ze != "yes", NA, 
                 ifelse(va_ze == "yes" & is.na(jk_ze), loc, jk_ze))) 

> data_test_2
  va_te va_ti va_ze jk_te jk_ti jk_ze loc
1   yes    no  <NA>   345    NA    NA 345
2  <NA>  <NA>    no    NA    NA    NA 898
3    no   yes   yes    NA   908   444 444
4   yes    no    no   987    NA    NA 321

Solution

  • You can use sub "jk" with "va" on the name of the current column (cur_column()), and get the columns for the actions.

    library(dplyr)
    
    data_test %>% 
      mutate(across(starts_with("va_"), ~ ifelse(.x == "", NA, .x)),
             across(starts_with("jk_"), ~ ifelse(get(sub("jk", "va", cur_column())) != "yes", NA,
                                                 ifelse(get(sub("jk", "va", cur_column())) == "yes" & is.na(.x), loc, .x))))
    
      va_te va_ti va_ze jk_te jk_ti jk_ze loc
    1   yes    no  <NA>   345    NA    NA 345
    2  <NA>  <NA>    no    NA    NA    NA 898
    3    no   yes   yes    NA   908   444 444
    4   yes    no    no   987    NA    NA 321