I have a csv file with 12,626,756 rows that I need to split into smaller files so a colleague can open them in Excel. I want to create a loop that splits the file into files that fit within Excel's row limit and exports them as CSV files until it reaches the end (it should produce 13 files)
#STEP 1: load data
data <- read.csv(".../Desktop/Data/file.csv", header = TRUE)
#STEP 2: count rows
totalrows <- nrow(data)
#STEP 3: determine how many splits you need
excelrowlimit <- 1048576 - 5
filesrequired <- ceiling(totalrows/ excelrowlimit)
for example:
csvfile 1 should contain rows 1:1048571
csvfile 2 should contain rows 1048572:2097143
csvfile 3 should contain rows 2097144:3145715
csvfile 4 should contain rows 3145716:4194287
... and so on
how can I write a loop statement that (1) splits by number of files needed and (2) gives a different file name to each csv export?
Here's a solution expanding my comment above. This should have a smaller memory requirement than any other solution as it does not require copying all or part of the original data frame.
library(tidyverse)
rowCount <- 1048571
data %>%
mutate(Group = ceiling((row_number()) / rowCount)) %>%
group_by(Group) %>%
group_walk(
function(.x, .y) {
write.csv(.x, file = paste0("file", .y$Group, ".csv"))
}
)