rselectdplyrgrepl

How to use grepl() to select specific strings in a list of dataframes?


In a list of dataframes, I need to select the variables that are named "id" and those that include "Duo". So I will have two variables per datafrmae in the output.

data <- list(foo = structure(list(bodyPart = c("leg", "arm", "knee"), 
side = c("LEFT", "RIGHT", "LEFT"), device = c("LLI", "LSM", 
"GHT"), `Duo:length` = c(12, 476, 7), id = c("AA", "BB", "CC"), 
mwID = c("a12", "k87", "j98")), class = "data.frame", row.names = c(NA, 
-3L)), bar = structure(list(bodyPart = c("ankel", "ear", "knee"
), `Duo:side` = c("LEFT", "LEFT", "LEFT"), device = c("GOM", "LSM", 
"YYY"), id = c("ZZ", "DD", "FF"), tqID = c("kj8", "ll23", "sc26"
)), class = "data.frame", row.names = c(NA, -3L)))

Desired output:

output <- list(foo = structure(list(`Duo:length` = c(12, 476, 7), id = c("AA", "BB", "CC")), 
class = "data.frame", row.names = c(NA, -3L)), 
bar = structure(list(`Duo:side` = c("LEFT", "LEFT", "LEFT"), id = c("ZZ", "DD", "FF")), 
class = "data.frame", row.names = c(NA, -3L)))

Here is the code that yields only the id columns. I am not sure why it can't get the columns including Duo.

lapply(data_list, function(cr) cr %>% dplyr::select(id, where(~any(grepl("Duo", names(.))))))

I would really appreciate your advice.


Solution

  • Using dplyr syntax :

    library(dplyr)
    
    purrr::map(data, ~.x %>% select(id, contains("Duo")))
    
    #$foo
    #  id Duo:length
    #1 AA         12
    #2 BB        476
    #3 CC          7
    
    #$bar
    #  id Duo:side
    #1 ZZ     LEFT
    #2 DD     LEFT
    #3 FF     LEFT
    

    Or using regular expressions.

    purrr::map(data, ~.x %>% select(matches("^(id|Duo)")))