In a folder (path = "D:/DataLogs/
), I have several subfolders. Inside these subfolders, I would like to retrieve all the csv starting only with "QCLog" and merge them (rbind
) into a single data.frame, while creating a first new column including the full path/name of these QCLog csv.
The two difficulties are:
Is this feasible?
Thanks for help
The following code, although a bit long, works fine and fast:
library(tidyverse)
library(data.table)
# paths
main_path <- "D:/DataLogs/" # target directory
zip_path <- "D:/DataLogs/Zip_files/" # subfolder where the zips will be unzipped
# specific string of csv file name
str_csv <- "QCLog"
# list csv
csv_nozip_list <- list.files(main_path, paste0(str_csv, ".*\\.csv"), recursive = TRUE, full.names = TRUE) # list of QCLog csv that ARE NOT within zip
zip_list <- list.files(main_path, "\\.zip", recursive = TRUE, full.names = TRUE) # list of all csv -including QCLog- that ARE within zip
unzip_files <- lapply(zip_list, unzip, exdir = zip_path) # unzip all CSV in the zip_path subfolder
csv_zip_list <- list.files(zip_path, paste0(str_csv, ".*\\.csv"), recursive = TRUE, full.names = TRUE) # list only QCLog that have been decompressed in zip_path
# read matching files while adding a new filename column
csv_nozip_read <- lapply(csv_nozip_list, function(x) {data.table(fread(x), "filename" = x)})
csv_zip_read <- lapply(csv_zip_list, function(x) {data.table(fread(x), "filename" = x)})
# rbind lists
csv_nozip_rbind <- data.table::rbindlist(csv_nozip_read)
csv_zip_rbind <- data.table::rbindlist(csv_zip_read)
# rbind selected csv
dat <- rbind(csv_nozip_rbind, csv_zip_rbind) |>
distinct() |> # if needed
relocate(filename)