I am working with a dataset in R. The dataset has with MANY date variables and I want to subset the data frame based on dates between 2023-01-01 and 2023-12-31.
I know I can use the command based on one variable:
df2023 <- df[df$"variable.name" >= "2023-01-01" & df$"variable.name" <= "2023-12-31", ]
But I need to somehow be able to "screen" all columns. If 2023 appears in any column the entire row/record should be included. Is this possible?
The dataset contains both numerical, character, categorical and date variables.
Thanks.
Questions to SO should include test data. Since you are relatively new we will provide that using data shown in the Note at the end.
1) For an example we will extract all rows having 2021 in any date column. Compare the year to 2021 and keep the rows for which any of those comparisons are TRUE. Note that the -1 in across
means all columns except the first.
library(dplyr)
library(lubridate) # year
dat %>%
rowwise %>%
filter(any(across(-1, year) == 2021, na.rm = TRUE) ) %>%
ungroup
## # A tibble: 2 × 3
## SSN date_today date_adm
## <dbl> <chr> <chr>
## 1 101 2021-07-09 <NA>
## 2 666 1914-01-01 2021-04-07
2) For a base R approach first create a function which converts a character string to a year and then compare it to 2021 determining if any of those comparisons on each row is TRUE and hand that to subset
. Note that we use [-1] to exlude the first column since it is not a date column.
yr <- function(x) as.numeric(substr(x, 1, 4))
subset(dat, apply(dat[-1], 1, \(x) any(yr(x) == 2021, na.rm = TRUE)))
## SSN date_today date_adm
## 3 101 2021-07-09 <NA>
## 4 666 1914-01-01 2021-04-07
This is from the this link except we have used NA in place of "NA".
dat <- data.frame(
SSN = c(204,401,101,666,777),
date_today = c("1914-01-01","2022-03-12","2021-07-09","1914-01-01","2022-04-05"),
date_adm = c("2020-03-11","2022-03-12",NA,"2021-04-07","2022-04-05"))
dat
## SSN date_today date_adm
## 1 204 1914-01-01 2020-03-11
## 2 401 2022-03-12 2022-03-12
## 3 101 2021-07-09 NA
## 4 666 1914-01-01 2021-04-07
## 5 777 2022-04-05 2022-04-05