rlistselectdata.tablerbind

From many RData within different subfolders, list only data.frames, select some columns, then rbind them all and unlist


I've searched extensively, but I can't find a solution that meets all these objectives at once:

I would like to be able to import only the data.frames into a list (i.e., without the other character objects nor the other formats), while selecting only their columns 3, 4, 5, and 6, and then rbind them all into a new, single object in the environment (i.e., no longer in a list).
Optional question: given the high number and large size of the data.frames, wouldn't it be better to convert the data.frames to data.tables first?

Thanks for help.

Sorry, but given the complexity of the case, I don't see how to provide a concrete example to test.


Solution

  • Assuming

    paths = list.files('<path_to_top_level_folder>', pattern=".RData$", recursive=TRUE, full.names=TRUE)
    

    you might want to start developing something robust from

    ## for explicity:
    # result = 
    lapply(paths, \(i) {
      load(i,  i<-new.env())
      d = get(Filter(\(x) is.data.frame(get(x, envir=i)), ls(i)), i)
      d[3:6]
    }) |> data.table::rbindlist() # |> do.call(what='rbind') 
    
    ## streamlined: 
    # result = 
    lapply(paths, \(i) { 
      load(i)
      get(Filter(\(x) is.data.frame(get(x)), ls()))[3:6] 
    }) |> data.table::rbindlist()
    

    If the name of the data.frame is always the same, this can be done even more concise. Subsetting by column names is less error-prone, but requires that all data frames have the same column names (w/o typos).


    Note

    data.table::rbindlist() handles data.frame objects just fine, and is quite fast.

    > class(mtcars[3:6])
    [1] "data.frame"
    > dim(mtcars[3:6])
    [1] 32  4
    > list(mtcars[3:6], mtcars[3:6] * 2) |> data.table::rbindlist() |> dim()
    [1] 64  4