I am struggling with creating a for loop over all txt.files in a specific repository. The goal is to merge all separately saved txt.files in a dataframe and add an ID-variable that can always be found in the txt-file-names (e.g., ID=10 for the file "10_1. Recording 01.10.2015 131514_CsvData.txt" )
txt_files <- list.files("Data/study", pattern = ".txt")
txt_files [1] "1_1. Recording 18.09.2015 091037_CsvData.txt" "10_1. Recording 01.10.2015 131514_CsvData.txt"
[3] "100_1. Recording 02.10.2015 091630_CsvData.txt" "104_1. Recording 22.09.2015 142604_CsvData.txt"
[5] "107_1. Recording 18.09.2015 104300_CsvData.txt" "110_1. Recording 29.09.2015 081558_CsvData.txt"
[7] "112_1. Recording 21.09.2015 082908_CsvData.txt" "114_1. Recording 29.09.2015 101159_CsvData.txt"
[9] "115_1. Recording 23.09.2015 141204_CsvData.txt" "116_1. Recording 30.09.2015 110624_CsvData.txt"
[11] "117_1. Recording 01.10.2015 141227_CsvData.txt" "120_1. Recording 17.09.2015 153516_CsvData.txt"
Read in and merge txt.files
for ( file in txt_files){
# if the merged dataframe "final_df" doesn't already exist, create it
if (!exists("final_df")){
final_df<- read.table(paste("Data/study/",file, sep=""), header=TRUE, fill=TRUE)
temp_ID <- substring(file, 0,str_locate_all(pattern ='_1.',file)[[1]][1]-1)
final_df$ID <- temp_ID
final_df <- as.data.frame(final_df)
}
# if the merged dataframe does already exist, append to it
else {
temp_dataset <- read.table(paste("Data/study/",file, sep=""), header=TRUE, fill=TRUE)
# extract ID column from filename
temp_ID <- substring(file, 0,str_locate_all(pattern ='_1.',file)[[1]][1]-1)
temp_dataset$ID <- temp_ID
final_df<-rbind(final_df, temp_dataset)
}
return(as.data.frame(final_df))
}
Avoid using rbind
in a loop which leads to excessive copying in memory. Consider building a list of data frames and bind them together once with do.call
outside of any loop. For this approach, lapply
is a useful iterative alternative than for
to build such a list of data frames as you avoid the bookkeeping of initializing an empty list and iteratively updating elements.
Also consider paste0
with no separator argument and gsub
to remove any characters from underscore to end of string for to extract ID.
setwd("Data/study")
txt_files <- list.files(pattern = ".txt")
df_list <- lapply(txt_files, function(file)
transform(read.table(file, header=TRUE, fill=TRUE),
temp_ID = gsub("_.*", "", file))
)
final_df <- do.call(rbind, df_list)