Looping over every day in several fiscal years in R

I have a dataset with fiscal year, customer ID, customer type, entrance dates, and release dates, and I'm trying to create a dataframe with daily summary info of counts of active customers for each customer type. The reason I'm working this is as an intermediary step prior to doing some time series analysis to predict future states.

Minimum Working Dataset

#### SETUP ----
library(tidyverse)

#### Minimum Working Example DF ----
set.seed(789456)

mwe_2020 <- 
  # Create initial table with year, ID, customer type, and entrance date
  tibble(
    fy_year       = rep(2020, 4000),
    cus_id        = sample(10:100000, 4000, replace = FALSE),
    cus_gr        = sample(0:12, 4000, replace = TRUE),
    entrance_date = as_date("2019-07-01")
  )

mwe_release <- 
  # Create a dataframe of dates between the entrance date and the end of the FY
  tibble(
    release_date = seq.Date(as_date("2019-07-01"), as_date("2020-06-30"), "day")
  ) %>% 
  mutate(
    release_day = weekdays(release_date)
  ) %>% 
  filter(
    # Remove weekends as no releases are processed on the weekend
    !grepl("Sat|Sun", release_day)
  )

mwe_release_sample <- mwe_2020 %>% 
  # Take a sample of 1000 records to apply a random release date to
  slice_sample(n = 1000) %>% 
  mutate(
    release_date = sample(mwe_release$release_date, 1000, replace = TRUE)
  ) %>% 
  select(
    cus_id,
    release_date
  )

mwe_2020 <- mwe_2020 %>% 
  # Join the two dfs to get the final MWE data
  left_join(mwe_release_sample, by = "cus_id")

mwe_2021 <- 
  # Create initial table with year, ID, customer type, and entrance date
  tibble(
    fy_year       = rep(2021, 4000),
    cus_id        = sample(10:100000, 4000, replace = FALSE),
    cus_gr        = sample(0:12, 4000, replace = TRUE),
    entrance_date = as_date("2020-07-01")
  )

mwe_release <- 
  # Create a dataframe of dates between the entrance date and the end of the FY
  tibble(
    release_date = seq.Date(as_date("2020-07-01"), as_date("2021-06-30"), "day")
  ) %>% 
  mutate(
    release_day = weekdays(release_date)
  ) %>% 
  filter(
    # Remove weekends as no releases are processed on the weekend
    !grepl("Sat|Sun", release_day)
  )

mwe_release_sample <- mwe_2021 %>% 
  # Take a sample of 1000 records to apply a random release date to
  slice_sample(n = 1000) %>% 
  mutate(
    release_date = sample(mwe_release$release_date, 1000, replace = TRUE)
  ) %>% 
  select(
    cus_id,
    release_date
  )

mwe_2021 <- mwe_2021 %>% 
  # Join the two dfs to get the final MWE data
  left_join(mwe_release_sample, by = "cus_id")

rm(mwe_release, mwe_release_sample)

mwe <- bind_rows(mwe_2020, mwe_2021)

rm(mwe_2020, mwe_2021)

So, you can see with the above data, we have a total of 8,000 rows spanning two fiscal years. The actual data is something like 500,000 rows spanning five fiscal years.

Attempted Solution

I was able to get what I'm looking for to work for a single fiscal year with the following:

#### Output ----
mwe_final <- 
  tibble(
    cus_gr = "",
    key_date = "",
    n_cus = ""
  ) %>% 
  mutate(
    cus_gr = as.integer(cus_gr),
    key_date = as_date(key_date),
    n_cus = as.integer(n_cus)
  )

for (i in seq.Date(as_date("2019-07-01"), as_date("2020-06-30"), "day")) {

  mwe_active <- mwe %>% 
    filter(
      fy_year == 2020,
      is.na(release_date) | release_date >= as_date(i)
    ) %>% 
    mutate(key_date = as_date(i)) %>% 
    group_by(cus_gr, key_date) %>% 
    summarise(n_cus = n(), .groups = "drop")
  
  mwe_final <- mwe_final %>% 
    bind_rows(mwe_active)
}

This gave me the expected output of one row per cus_gr per day in the selected fiscal year:

> head(mwe_final)
# A tibble: 6 × 3
  cus_gr key_date   n_cus
   <int> <date>     <int>
1     NA NA            NA
2      0 2019-09-03   303
3      1 2019-09-03   292
4      2 2019-09-03   295
5      3 2019-09-03   317
6      4 2019-09-03   291

> tail(mwe_final)
# A tibble: 6 × 3
  cus_gr key_date   n_cus
   <int> <date>     <int>
1      7 2020-06-30   244
2      8 2020-06-30   227
3      9 2020-06-30   225
4     10 2020-06-30   233
5     11 2020-06-30   235
6     12 2020-06-30   215

Where I'm struggling

I'm not sure how to loop over each day in each fiscal year. My expected output should look like the above, but instead of dates 2019-07-01 through 2020-06-30, I need the script to run to present day over each fiscal year in the dataset. I'm trying to make this script as hands-off as possible so it can be run anytime with loading in the new data and looping from the beginning of the dataset and earliest fiscal year through the end of the dataset to whatever the current day is or the end of the most recent fiscal year.

I'm guessing I'll need to create a new table with the fiscal years and dates something like the following:

# Fiscal years and dates
fy_dates <- tibble(
  fy = 2020:2025,
  start = c("2019-07-01", "2020-07-01", "2021-07-01", "2022-07-01", "2023-07-01", "2024-07-01")
  end   = c("2020-06-30", "2021-06-30", "2022-06-30", "2023-06-30", "2024-06-30", "2025-06-30")

But I'm not sure how to combine this new information into the for loop itself to make sure it loops correctly. I'd love some pointers here if anyone can help me out.

Solution

Avoid calling bind_rows inside a loop which can cause excessive copying. Instead, build a list of tibbles using lapply inside a call to Map (elementwise loop and wrapper of mapply) using your fiscal years tibble. Then, call bind_rows outside of loop. Of course, there are purrr synonyms such as map and pmap.

aggregate_release <- function(fy, start, end) {
  seq_dts <- seq.Date(start, end, "day")

  tibbles <- lapply(seq_dts, \(dt) {
    mwe %>% 
      filter(fy_year == fy, is.na(release_date) | release_date >= as_date(dt)) %>% 
      mutate(key_date = as_date(dt)) %>% 
      group_by(cus_gr, key_date) %>% 
      summarise(n_cus = n(), .groups = "drop")
  })

  bind_rows(tibbles)
}

release_fy_tibbles <- with(fy_dates, Map(aggregate_release, fy=fy, start=start, end=end))

release_final <- bind_rows(release_fy_tibbles)