I have this big file names Objects_Population - AllCells.txt that is ~3GB, the file has 25704373 rows and 132 variables. I want to read the file and split the rows based on one variable, which is the column named treatmentsum. In this column, I have experimental drug treatments under different conditions (3S or UNS), that is strings linked with "_". So the split will put all rows with the same treatment together. After split the file, I want to write out the split files and give the file names using the treatmentsum.
My code is below :
#load libraries
library(tidyverse)
library(vroom)
library(dplyr)
library(stringr)
#read in the file, skip the first 9 rows
files<-vroom("Objects_Population - AllCells.txt", delim = "\t",skip = 9,col_names = T)
#split the files based on treatmentsum
splited<- files %>%
group_split(files$treatmentsum)
#write out the splitted files
output<- lapply(splited, function(i){
for (i in 1:length(splited)) {
write.table(splited[[i]][,1:131],file=paste(unique(splited[[i]]$treatmentsum),".txt"), sep="\t", row.names=FALSE)
}
})
So when I run it, the file read correctly, and the split worked fine and treatments are spitted as expected, that is I get a list of 1092 (shown in the environment), each list contains the rows with the same treatment. However it the code dies every time after it writes me 233 files. I have screened shot the error, and all the files generated are 3S, no UNS files generated (as you can see in the right bottom file directory screenshot). Can someone help me with this and let me know what the error means?
I figured out some of the file names due to the name of treatments will have "/" in it. Inspired by this https://stackoverflow.com/a/49647853/12362355
library(tidyverse)
library(vroom)
library(dplyr)
library(stringr)
files<-vroom("Objects_Population - AllCells.txt", delim = "\t",skip = 9,col_names = T)
splited<- files %>%
group_split(files$treatmentsum)
output<- lapply(splited, function(i){
for (i in 1:length(splited)) {
write.table(splited[[i]][,1:131],file=paste0(gsub("/","",unique(splited[[i]]$treatmentsum)),".txt"), sep="\t",
row.names=FALSE)
}
})