rdataframelistsubset

How to subset list of Dfs based on the string content of a specific column - R Language


Imagine I have the followuing list of Data Frames:

df1 <- data.frame (x = c(1, 2, 3), y = c(12, 11, 10), text = c("banana", "avocado", "letuce"))
df2 <- data.frame (x = c(4, 5, "letuce"), y = c(9, 8, 7), text = c("watermelon", "avocado", "grape"))
df3 <- data.frame (x = c(7, 8, 9), y = c(6, 5, 4), text = c("letuce", "apricot", "apple"))
df4 <- data.frame (x = c(10, 11, 12), y = c(3, "letuce", 1), text = c("pineaple", "blueberry", "morango"))

my_list <- list(df1, df2, df3, df4)

How can i keep only the data frames that contains the word "letuce" in the "text" column?

The desired result is this:

subset_list <- list(df1, df3)

I've managed to match the string using this code:

library(tidyverse)
lapply(my_list, with, str_detect(text, "letuce"))

Solution

  • You can do:

    library(tidyverse)
    my_list[my_list %>%
              map(.f = ~ any(.x$text == 'letuce')) %>%
              unlist()]
    

    which gives:

    [[1]]
      x  y    text
    1 1 12  banana
    2 2 11 avocado
    3 3 10  letuce
    
    [[2]]
      x y    text
    1 7 6  letuce
    2 8 5 apricot
    3 9 4   apple
    

    The solution currently assumes that you want to match whole cases being 'letuce'. If you want to match cases merely containing the word 'letuce', you can do:

    my_list[my_list %>%
              map(.f = ~ any(str_detect(.x$text, 'letuce'))) %>%
              unlist()]
    

    Inspired by B.Grothendieck‘s comment (I totally forgot about keep), we could simply do:

    my_list %>%
      keep(.p = ~any(str_detect(.x$text, 'letuce')))