rdplyrgreplmutate

Mutate to concatenate columns that contain a specific string in their names


I'm trying to create a new column that concatenates all values of a specific set of columns that contain a specific string in their names, with a semi-colon separator. I'm working within dplyr, so I'm seeking a tidyverse solution.

I attempted to use grepl() in combination with mutate(), case_when(), and paste() to identify the columns that contained the string I wanted in their names ("Games") and concatenate their contents together into a new column. When that failed, I attempted to use str_detect instead, but that didn't work out.

As far as I can tell, my issue is that I can't correctly instruct the code to evaluate all column names and then return the ones that have a string that contains my specified pattern. I've tried using contains("Games"), colnames(.x), and other variations on those arguments. I know I could do this if I specifically named every column I wished to paste together, but I prefer a relative solution that prevents me from typing out multiple names.

Thank you!

# Sample Data

test<-as_tibble(data.frame(`ID` = c("1","2","3"),
                           `Gender` = c("Female","Male","Non-Binary"),
                           `Games_Chess`=c("Chess",NA,"Chess"),
                           `Games_Clue`=c("Clue",NA,NA),
                           `Games_Scrabble`=c("Scrabble",NA,"Scrabble")))
# A tibble: 3 × 5
  ID    Gender     Games_Chess Games_Clue Games_Scrabble
  <chr> <chr>      <chr>       <chr>      <chr>         
1 1     Female     Chess       Clue       Scrabble      
2 2     Male       NA          NA         NA            
3 3     Non-Binary Chess       NA         Scrabble  

# Desired Output

ID    Gender     Games_Chess Games_Clue Games_Scrabble Games    
1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
2     Male       NA          NA         NA             NA
3     Non-Binary Chess       NA         Scrabble       Chess; Scrabble  

# Attempted Code 1

test<-test%>%
  mutate(`Games` = case_when(str_detect(colnames(test),"Games") ~ paste(.x, collapse = ";"), TRUE ~ NA))

# Error Code 1

Error in `mutate()`:
ℹ In argument: `Games = case_when(...)`.
Caused by error in `case_when()`:
! Failed to evaluate the right-hand side of formula 1.
Caused by error:
! object '.x' not found

# Attempted Code 2
test<-test%>%
  mutate(`Games` = case_when(grepl("Games",.) ~ paste(., collapse = ";"), TRUE ~ NA))

# Error Code 2

Error in `mutate()`:
ℹ In argument: `Games = case_when(...)`.
Caused by error:
! `Games` must be size 3 or 1, not 4.
Run `rlang::last_trace()` to see where the error occurred.

Solution

  • Though not entirely dplyr-based, the following solution does still fall under the tidyverse using tidyr:

    test %>%
      tidyr::unite(
        # Name of new column
        col = "Games",
        # Select columns to unite using tidy-select syntax
        dplyr::starts_with("Games"),
        # Specify semi-colon as separator
        sep = "; ",
        # Keep original Games_* columns
        remove = FALSE,
        # Remove NA's prior to concatenation
        na.rm = TRUE
      )
    

    Which results in the following output:

    #>   ID     Gender                 Games Games_Chess Games_Clue Games_Scrabble
    #> 1  1     Female Chess; Clue; Scrabble       Chess       Clue       Scrabble
    #> 2  2       Male                              <NA>       <NA>           <NA>
    #> 3  3 Non-Binary       Chess; Scrabble       Chess       <NA>       Scrabble
    

    Edit: optionally, if you'd like the new column to be at the very end of your data, then you can add the following code at the end using the %>% pipe operator to move the new column to the end:

    dplyr::relocate(
        Games,
        .after = dplyr::everything()
      )