[SOLVED] Split large file in R into smaller files with a loop

Split large file in R into smaller files with a loop

I have a csv file with 12,626,756 rows that I need to split into smaller files so a colleague can open them in Excel. I want to create a loop that splits the file into files that fit within Excel's row limit and exports them as CSV files until it reaches the end (it should produce 13 files)

#STEP 1: load data
data <- read.csv(".../Desktop/Data/file.csv", header = TRUE)

#STEP 2: count rows
totalrows <- nrow(data)

#STEP 3: determine how many splits you need 
excelrowlimit <- 1048576 - 5
filesrequired <- ceiling(totalrows/ excelrowlimit)

for example:

csvfile 1 should contain rows 1:1048571
csvfile 2 should contain rows 1048572:2097143
csvfile 3 should contain rows 2097144:3145715
csvfile 4 should contain rows 3145716:4194287
... and so on

how can I write a loop statement that (1) splits by number of files needed and (2) gives a different file name to each csv export?

Solution

Here's a solution expanding my comment above. This should have a smaller memory requirement than any other solution as it does not require copying all or part of the original data frame.

library(tidyverse)

rowCount <- 1048571
data %>% 
  mutate(Group = ceiling((row_number()) / rowCount)) %>% 
  group_by(Group) %>% 
  group_walk(
    function(.x, .y) {
      write.csv(.x, file = paste0("file", .y$Group, ".csv"))
    }
  )