rdataframedplyr

Add rows to dataframe in dplyr pipe based on missing values


I have a dataframe that looks like this:

date fishery tournament_day angler ffs_period used_ffs
2025-01-30 Lake Conroe 1 Martin Villa P1 TRUE
2025-01-31 Lake Conroe 2 Martin Villa P2 TRUE
2025-02-1 Lake Conroe 3 Martin Villa P1 TRUE
2025-02-13 Harris Chain 1 Martin Villa P3 TRUE
structure(list(date = structure(c(1738195200, 1738281600, 1738368000, 
1739404800, 1741219200, 1741305600, 1743638400, 1743724800, 1743811200
), tzone = "UTC", class = c("POSIXct", "POSIXt")), fishery = c("Lake Conroe", 
"Lake Conroe", "Lake Conroe", "Harris Chain", "Lake Murray", 
"Lake Murray", "Lake Guntersville", "Lake Guntersville", "Lake Guntersville"
), tournament_day = c(1, 2, 3, 1, 1, 2, 1, 2, 3), angler = c("Martin Villa", 
"Martin Villa", "Martin Villa", "Martin Villa", "Martin Villa", 
"Martin Villa", "Martin Villa", "Martin Villa", "Martin Villa"
), ffs_period = c("P1", "P2", "P1", "P3", "P1", "P1", "P3", "P2", 
"P1"), used_ffs = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", 
"data.frame"))

My issue is there are 3 "ffs_period" observations for each unique date: P1, P2, P3. I only have data for the observation that "used_ffs" is "TRUE". For the additional two observations for each date, fishery, tournament day and angler I need to populate the remaining two rows with "ffs_period" and "used_ffs" that is not present. For instance, the first observation where "ffs_period" is P1 and "used_ffs" is TRUE, I would need to populate two rows with the same date, fishery, tournament_day, and angler where "ffs_period" is P2 and P3 and used_ffs is FALSE for both. The end result would look like this:

date fishery tournament_day angler ffs_period used_ffs
2025-01-30 Lake Conroe 1 Martin Villa P1 TRUE
2025-01-30 Lake Conroe 1 Martin Villa P2 FALSE
2025-01-30 Lake Conroe 1 Martin Villa P3 FALSE
2025-01-31 Lake Conroe 2 Martin Villa P1 FALSE
2025-01-31 Lake Conroe 2 Martin Villa P2 TRUE
2025-01-31 Lake Conroe 2 Martin Villa P3 FALSE
2025-02-1 Lake Conroe 3 Martin Villa P1 TRUE
2025-02-1 Lake Conroe 3 Martin Villa P2 FALSE
2025-02-1 Lake Conroe 3 Martin Villa P3 FALSE
2025-02-13 Harris Chain 1 Martin Villa P1 FALSE
2025-02-13 Harris Chain 1 Martin Villa P2 FALSE
2025-02-13 Harris Chain 1 Martin Villa P3 TRUE

Is this possible to do? Or is this overly complicated?


Solution

  • tidyr::complete would be good here. Since you want all ffs_period for each existing combination of date/fishery/tournament/angler, we can use nesting to limit the output to existing combinations of those, and specify the desired levels of ffs_period to make sure all three appear for every combo, even if not all three appear in the example data.

    We can also use the fill parameter to specify the used_ffs value for added rows.

    library(tidyr)
    df[1:4,] |>
      complete(nesting(date, fishery, tournament_day, angler),
               ffs_period = c("P1", "P2", "P3"),
               fill = list(used_ffs = FALSE))
    

    Result

    
    # A tibble: 12 × 6
       date                fishery      tournament_day angler       ffs_period used_ffs
       <dttm>              <chr>                 <dbl> <chr>        <chr>      <lgl>   
     1 2025-01-30 00:00:00 Lake Conroe               1 Martin Villa P1         TRUE    
     2 2025-01-30 00:00:00 Lake Conroe               1 Martin Villa P2         FALSE   
     3 2025-01-30 00:00:00 Lake Conroe               1 Martin Villa P3         FALSE   
     4 2025-01-31 00:00:00 Lake Conroe               2 Martin Villa P1         FALSE   
     5 2025-01-31 00:00:00 Lake Conroe               2 Martin Villa P2         TRUE    
     6 2025-01-31 00:00:00 Lake Conroe               2 Martin Villa P3         FALSE   
     7 2025-02-01 00:00:00 Lake Conroe               3 Martin Villa P1         TRUE    
     8 2025-02-01 00:00:00 Lake Conroe               3 Martin Villa P2         FALSE   
     9 2025-02-01 00:00:00 Lake Conroe               3 Martin Villa P3         FALSE   
    10 2025-02-13 00:00:00 Harris Chain              1 Martin Villa P1         FALSE   
    11 2025-02-13 00:00:00 Harris Chain              1 Martin Villa P2         FALSE   
    12 2025-02-13 00:00:00 Harris Chain              1 Martin Villa P3         TRUE