In extracting information from a pdf using tabulizer and pdftools, I sometimes would like to index a large list of df based on a regex pattern match.
a <- data.frame(yes=c("pension"))
b <- data.frame(no=c("other"))
my_list <- list(a,b)
I would like to use str_detect to return an index of underlying df matching the pattern "pension".
The desired output would be:
index <- 1 (based on which and str_detect)
new_list <- my_list[[index]]
new_list
yes
1 pension
How to detect the pattern in the underlying df and then return the index using which has been a struggle. I see previous discussions using loops and if-then statements, but a solution using purrr seems preferred.
We may use
getIdx <- function(pattern, l)
l %>% map_lgl(~ any(unlist(map(.x, grepl, pattern = pattern))))
getIdx("pension", my_list)
# [1] TRUE FALSE
my_list[getIdx("pension", my_list)]
# [[1]]
# yes
# 1 pension
This allows for multiple matching data frames. (No need for which
really.)
In getIdx
we go over data frames of l
, then in a given data frame we go over its columns and use grepl
. If there is a match in any of the columns, TRUE
is returned for the corresponding data frame.