Filtering rows between zero values and save as new dataframes or datatables

I have a large csv dataset with more than 45k rows and 19 different variables. I'd like to filter it by a specific variable (V4) so that each filtered group starts with 0 and then the next 0 will mark the start of a new group/dataframe/datatable, while keeping all other variables inside this new table as well. I need those separate groups to further analyse each case of data. I tried:

filtered_data <- my_data %>%       
group_by("V4") %>%       
filter("V4" == 0 & "V4" !=0)      
View(filtered_data)

The first "V4" == 0 seems to work but I'm struggling how to define the end of each filtered dataframe e.g. how to filter from 0 to 3, then 0 to 5 etc. How can I determine the length of each case? Is there a logical operator that saves each group before V4 turns 0 again? Or would it be better to create a loop?

Example of my_data:

        V1    V2    V3    V4    .    .    .    V19 
    1                     0
    2                     1
    3                     2
    4        `            3
    5                     0
    6                     1
    7                     2
    8                     3
    9                     4
    10                    5
    11                    0
   ...
   45k

Solution

Here is a way to group your rows with basic arithmetic. I create the groups using a cumulative sum of an indicator variable (V4 is 0 or not) and split the data.frame into single dataframes using group_split.

# example data 12000 rows in total, 4000 groups of 3 rows
df <- data.frame(V1 = 1:12000, 
                 V2 = sample(LETTERS, 12000, replace = T), 
                 V4 = rep(0:2, 4000))

df <- df %>%
  mutate(Groups = ifelse(V4 == 0, 1, 0),
         Groups = cumsum(Groups)) %>%
  group_split(Groups)

So the first group/dataframe is

> df[[1]]
# A tibble: 3 x 4
     V1 V2       V4 Groups
  <int> <chr> <int>  <dbl>
1     1 L         0      1
2     2 L         1      1
3     3 Y         2      1

the second

> df[[2]]
# A tibble: 3 x 4
     V1 V2       V4 Groups
  <int> <chr> <int>  <dbl>
1     4 Z         0      2
2     5 N         1      2
3     6 Y         2      2

and so on.

If you want to save each data.frame seperately you could use something like this:

# new environment that holds all data.frames
dfEnv <- new.env()

df %>%
  mutate(Groups = ifelse(V4 == 0, 1, 0),
         Groups = cumsum(Groups)) %>%
  group_by(Groups) %>%
  do({
    # save every group inside the new environment as a single data.frame
    dfEnv[[paste0("Group_", unique(.$Groups))]] <- .
  })

Now you have dfEnv$Group_1, dfEnv$Group_2, ... and so on. Inside do() you could also use saveRDS or write.csv to save the data to disk.