rdataframelisttextlapply

Is there a way to use data.frame to iterate through a list of texts in R?


I used readLines from gutenbergr to read my list of legal cases into R. The code looks like this:

my_list <- list.files(path = "C:\\Users\\Ben Tice\\Documents\\R Stuff\\UW  Job\\Cases\\data\\txts", pattern = "\\.txt$")

cases_lines <- lapply(my_list, readLines)

I can convert individual cases successfully like this:

df_convert <- data.frame(line=1:length(cases_lines[[1]]), text=cases_lines[[1]])

But I would like to be able to use data.frame on all 75 cases without having to convert each one separately.

I tried using the lapply function as well as for loops, but I cannot get either of them to work. For example,

df_convert2 <- lapply(cases_lines, data.frame(line=1:length(cases_lines[[i]]), text=cases_lines[[i]])) 

runs but produces the following error message: "Error in cases_lines[[i]] : recursive indexing failed at level 2."

Ultimately, I need a list of the cases as data frames, so I can iterate through them with stringr functions to look for character patterns.


Solution

  • Since you have all of your case files in a list, you can iterate over the list, convert each case file into a data frame, and store the data frames in another list. Maybe something like this can help:

    cases_df <- list()
    for(i in 1:length(cases_lines)){
      cases_df[[i]] <- data.frame(line=1:length(cases_lines[[i]]), text=cases_lines[[i]])
    }
    

    This should create a list of 75 data frames of each case file.