rrbind

Want to use grep to get dataframe names from environment and then stack the rows with rbind function in R


I have thousands of dataframes and want to grep their names into a character vector. Then use the vector to complete the rbind function. Any suggestions?

dat1lkq6 <- data.frame(color = c('COLOR: RED', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
dat1ah2 <- data.frame(style = c('SPORTY', 'HYBRID', 'FORMAL', 'CASUAL', 'CASUAL'))
dat29fg <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
dat2xl <- data.frame(color = c('COLOR: RED', 'COLOR: CYAN', 'COLOR: BLUE', 'COLOR: RED', 'COLOR: BLUE'))
dat3g49 <- data.frame(color = c('COLOR: PURPLE', 'COLOR: RED', 'COLOR: BLUE', 'COLOR: GREEN', 'COLOR: BLUE'))
skus4 <- data.frame(sku = c('SKU: 1849354', 'SKU: 392856', 'SKU: 921385', 'SKU: 6395474', 'SKU: 8532449', 'SKU: 0285468', 'SKU: 2948327'))

#grep to get only "dat" dataframe names
all_dat_df <- base::ls(all.names = TRUE)[base::grep("^dat", base::ls(all.names = TRUE))]

#want to stack all the "dat" df's into one df, but not working
#result dataframe should have 25 rows
rbind(all_dat_df)

#tried various incarnations of dput, gsub, paste, noquote to no success

Solution

  • First, you use the grep() function to get the names of all objects that holds a data.frame, as you are already doing:

    all_objects <- base::ls()
    all_dat_df <- all_objects[base::grep("^dat", all_objects)]
    

    Now, the all_dat_df object is a character vector that holds a list of object names. But these are just the names of the objects, it has no reference to "where" this object lives, "what" values it holds, etc.

    So you need to transform these names, into actual references to the objects/data.frames you want to combine. To do that, you ask R to collect these objects into a list with the lapply() function.

    The lapply() function will apply the get() function to each object name in all_dat_df. This get() function will get the actual reference of the object that you are referencing with the object name. lapply() will just store the results of get() inside a R list.

    list_of_data_frames <- lapply(all_dat_df, get, envir = globalenv())
    

    After that, you just need to apply the dplyr::bind_rows() function over the list of data.frames you collected:

    big_data_frame <- dplyr::bind_rows(list_of_data_frames)
    

    Now, the big_data_frame object holds a single data.frame that have all rows from all data.frames found at your global environment by grep().

    NOTE: In order to use the dplyr::bind_rows() function, you need to have the dplyr package installed in your machine. If you do not have this package installed, use the code below to install it:

    install.packages("dplyr")