I have a dataframe that looks like this:
date | fishery | tournament_day | angler | ffs_period | used_ffs |
---|---|---|---|---|---|
2025-01-30 | Lake Conroe | 1 | Martin Villa | P1 | TRUE |
2025-01-31 | Lake Conroe | 2 | Martin Villa | P2 | TRUE |
2025-02-1 | Lake Conroe | 3 | Martin Villa | P1 | TRUE |
2025-02-13 | Harris Chain | 1 | Martin Villa | P3 | TRUE |
structure(list(date = structure(c(1738195200, 1738281600, 1738368000,
1739404800, 1741219200, 1741305600, 1743638400, 1743724800, 1743811200
), tzone = "UTC", class = c("POSIXct", "POSIXt")), fishery = c("Lake Conroe",
"Lake Conroe", "Lake Conroe", "Harris Chain", "Lake Murray",
"Lake Murray", "Lake Guntersville", "Lake Guntersville", "Lake Guntersville"
), tournament_day = c(1, 2, 3, 1, 1, 2, 1, 2, 3), angler = c("Martin Villa",
"Martin Villa", "Martin Villa", "Martin Villa", "Martin Villa",
"Martin Villa", "Martin Villa", "Martin Villa", "Martin Villa"
), ffs_period = c("P1", "P2", "P1", "P3", "P1", "P1", "P3", "P2",
"P1"), used_ffs = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE)), row.names = c(NA, -9L), class = c("tbl_df", "tbl",
"data.frame"))
My issue is there are 3 "ffs_period" observations for each unique date: P1, P2, P3. I only have data for the observation that "used_ffs" is "TRUE". For the additional two observations for each date, fishery, tournament day and angler I need to populate the remaining two rows with "ffs_period" and "used_ffs" that is not present. For instance, the first observation where "ffs_period" is P1 and "used_ffs" is TRUE, I would need to populate two rows with the same date, fishery, tournament_day, and angler where "ffs_period" is P2 and P3 and used_ffs is FALSE for both. The end result would look like this:
date | fishery | tournament_day | angler | ffs_period | used_ffs |
---|---|---|---|---|---|
2025-01-30 | Lake Conroe | 1 | Martin Villa | P1 | TRUE |
2025-01-30 | Lake Conroe | 1 | Martin Villa | P2 | FALSE |
2025-01-30 | Lake Conroe | 1 | Martin Villa | P3 | FALSE |
2025-01-31 | Lake Conroe | 2 | Martin Villa | P1 | FALSE |
2025-01-31 | Lake Conroe | 2 | Martin Villa | P2 | TRUE |
2025-01-31 | Lake Conroe | 2 | Martin Villa | P3 | FALSE |
2025-02-1 | Lake Conroe | 3 | Martin Villa | P1 | TRUE |
2025-02-1 | Lake Conroe | 3 | Martin Villa | P2 | FALSE |
2025-02-1 | Lake Conroe | 3 | Martin Villa | P3 | FALSE |
2025-02-13 | Harris Chain | 1 | Martin Villa | P1 | FALSE |
2025-02-13 | Harris Chain | 1 | Martin Villa | P2 | FALSE |
2025-02-13 | Harris Chain | 1 | Martin Villa | P3 | TRUE |
Is this possible to do? Or is this overly complicated?
tidyr::complete
would be good here. Since you want all ffs_period
for each existing combination of date/fishery/tournament/angler, we can use nesting
to limit the output to existing combinations of those, and specify the desired levels of ffs_period
to make sure all three appear for every combo, even if not all three appear in the example data.
We can also use the fill
parameter to specify the used_ffs
value for added rows.
library(tidyr)
df[1:4,] |>
complete(nesting(date, fishery, tournament_day, angler),
ffs_period = c("P1", "P2", "P3"),
fill = list(used_ffs = FALSE))
Result
# A tibble: 12 × 6
date fishery tournament_day angler ffs_period used_ffs
<dttm> <chr> <dbl> <chr> <chr> <lgl>
1 2025-01-30 00:00:00 Lake Conroe 1 Martin Villa P1 TRUE
2 2025-01-30 00:00:00 Lake Conroe 1 Martin Villa P2 FALSE
3 2025-01-30 00:00:00 Lake Conroe 1 Martin Villa P3 FALSE
4 2025-01-31 00:00:00 Lake Conroe 2 Martin Villa P1 FALSE
5 2025-01-31 00:00:00 Lake Conroe 2 Martin Villa P2 TRUE
6 2025-01-31 00:00:00 Lake Conroe 2 Martin Villa P3 FALSE
7 2025-02-01 00:00:00 Lake Conroe 3 Martin Villa P1 TRUE
8 2025-02-01 00:00:00 Lake Conroe 3 Martin Villa P2 FALSE
9 2025-02-01 00:00:00 Lake Conroe 3 Martin Villa P3 FALSE
10 2025-02-13 00:00:00 Harris Chain 1 Martin Villa P1 FALSE
11 2025-02-13 00:00:00 Harris Chain 1 Martin Villa P2 FALSE
12 2025-02-13 00:00:00 Harris Chain 1 Martin Villa P3 TRUE