Add rows to dataframe in dplyr pipe based on missing values

I have a dataframe that looks like this:

date	fishery	tournament_day	angler	ffs_period	used_ffs
2025-01-30	Lake Conroe	1	Martin Villa	P1	TRUE
2025-01-31	Lake Conroe	2	Martin Villa	P2	TRUE
2025-02-1	Lake Conroe	3	Martin Villa	P1	TRUE
2025-02-13	Harris Chain	1	Martin Villa	P3	TRUE

structure(list(date = structure(c(1738195200, 1738281600, 1738368000, 
1739404800, 1741219200, 1741305600, 1743638400, 1743724800, 1743811200
), tzone = "UTC", class = c("POSIXct", "POSIXt")), fishery = c("Lake Conroe", 
"Lake Conroe", "Lake Conroe", "Harris Chain", "Lake Murray", 
"Lake Murray", "Lake Guntersville", "Lake Guntersville", "Lake Guntersville"
), tournament_day = c(1, 2, 3, 1, 1, 2, 1, 2, 3), angler = c("Martin Villa", 
"Martin Villa", "Martin Villa", "Martin Villa", "Martin Villa", 
"Martin Villa", "Martin Villa", "Martin Villa", "Martin Villa"
), ffs_period = c("P1", "P2", "P1", "P3", "P1", "P1", "P3", "P2", 
"P1"), used_ffs = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", 
"data.frame"))

My issue is there are 3 "ffs_period" observations for each unique date: P1, P2, P3. I only have data for the observation that "used_ffs" is "TRUE". For the additional two observations for each date, fishery, tournament day and angler I need to populate the remaining two rows with "ffs_period" and "used_ffs" that is not present. For instance, the first observation where "ffs_period" is P1 and "used_ffs" is TRUE, I would need to populate two rows with the same date, fishery, tournament_day, and angler where "ffs_period" is P2 and P3 and used_ffs is FALSE for both. The end result would look like this:

date	fishery	tournament_day	angler	ffs_period	used_ffs
2025-01-30	Lake Conroe	1	Martin Villa	P1	TRUE
2025-01-30	Lake Conroe	1	Martin Villa	P2	FALSE
2025-01-30	Lake Conroe	1	Martin Villa	P3	FALSE
2025-01-31	Lake Conroe	2	Martin Villa	P1	FALSE
2025-01-31	Lake Conroe	2	Martin Villa	P2	TRUE
2025-01-31	Lake Conroe	2	Martin Villa	P3	FALSE
2025-02-1	Lake Conroe	3	Martin Villa	P1	TRUE
2025-02-1	Lake Conroe	3	Martin Villa	P2	FALSE
2025-02-1	Lake Conroe	3	Martin Villa	P3	FALSE
2025-02-13	Harris Chain	1	Martin Villa	P1	FALSE
2025-02-13	Harris Chain	1	Martin Villa	P2	FALSE
2025-02-13	Harris Chain	1	Martin Villa	P3	TRUE

Is this possible to do? Or is this overly complicated?

Solution

tidyr::complete would be good here. Since you want all ffs_period for each existing combination of date/fishery/tournament/angler, we can use nesting to limit the output to existing combinations of those, and specify the desired levels of ffs_period to make sure all three appear for every combo, even if not all three appear in the example data.

We can also use the fill parameter to specify the used_ffs value for added rows.

library(tidyr)
df[1:4,] |>
  complete(nesting(date, fishery, tournament_day, angler),
           ffs_period = c("P1", "P2", "P3"),
           fill = list(used_ffs = FALSE))

Result


# A tibble: 12 × 6
   date                fishery      tournament_day angler       ffs_period used_ffs
   <dttm>              <chr>                 <dbl> <chr>        <chr>      <lgl>   
 1 2025-01-30 00:00:00 Lake Conroe               1 Martin Villa P1         TRUE    
 2 2025-01-30 00:00:00 Lake Conroe               1 Martin Villa P2         FALSE   
 3 2025-01-30 00:00:00 Lake Conroe               1 Martin Villa P3         FALSE   
 4 2025-01-31 00:00:00 Lake Conroe               2 Martin Villa P1         FALSE   
 5 2025-01-31 00:00:00 Lake Conroe               2 Martin Villa P2         TRUE    
 6 2025-01-31 00:00:00 Lake Conroe               2 Martin Villa P3         FALSE   
 7 2025-02-01 00:00:00 Lake Conroe               3 Martin Villa P1         TRUE    
 8 2025-02-01 00:00:00 Lake Conroe               3 Martin Villa P2         FALSE   
 9 2025-02-01 00:00:00 Lake Conroe               3 Martin Villa P3         FALSE   
10 2025-02-13 00:00:00 Harris Chain              1 Martin Villa P1         FALSE   
11 2025-02-13 00:00:00 Harris Chain              1 Martin Villa P2         FALSE   
12 2025-02-13 00:00:00 Harris Chain              1 Martin Villa P3         TRUE