I have a large dataframe and I want to select rows which satisfy condition on date columns. The dataframe is similar to this:
library(tidyverse)
library(lubridate)
curdate <- seq(as.Date("2000/1/1"), by = "month", length.out = 24)
expdate <- rep(seq(as.Date("2000/3/1"), by = "quarter", length.out = 12),2)
afactor <- rep(c("C","P"),12)
anumber <- runif(24)
df<-data.frame(curdate, expdate, afactor, anumber)
df$expdate[12]<-as.Date("2001-02-01")
I would like to get the rows which the month of the expiration date (expdate) is two months later than the month of current date (curdate). In this example, I should select these five dates (rows 1, 7, 12, 13 and 19):
curdate expdate afactor anumber
2000-01-01 2000-03-01 C 0.6832251
2000-07-01 2001-09-01 C 0.2671076
2001-01-01 2000-03-01 C 0.2097065
2001-07-01 2001-09-01 C 0.9258450
2000-12-01 2001-02-01 P 0.4903951
First I used the following line for that:
df_select1 <- df %>% group_by(curdate, afactor) %>%
filter(month(expdate) == month(curdate)+2)
But it misses the cases when the month is November or December. For instance here, it misses the case when curdate is 2000-12-01. So I want to add a condition, to deal with these cases. I wrote:
df_select2 <- df %>% group_by(curdate, afactor) %>%
if_else(month(curdate)<11,
filter(month(expdate) == month(curdate)+2),
filter(month(expdate) == month(curdate)-10))
but I get the following error: condition
must be a logical vector, not a grouped_df/tbl_df/tbl/data.frame
object.
I found the following solution, but there are certainly much shorter ways to do it:
df_select1 <- df %>% group_by(curdate, afactor) %>%
filter(month(curdate)<11) %>%
filter(month(expdate) == month(curdate)+2)
df_select2 <- df %>% group_by(curdate, afactor) %>%
filter(month(curdate)>10) %>%
filter(month(expdate) == month(curdate)-10)
df_select <- full_join(df_select1, df_select2)
If you're importing lubridate, you probably should also make use of its functions for calculating with months. Those are a bit tricky obviously because they are not of equal lengths, why the base function difftime is not offering a monthly unit for example.
This would be a solution for your problem, without the if_else function:
df_select1 <- df %>% group_by(curdate, afactor) %>%
filter(expdate == curdate + months(2))
By the way, you're not running into problems as long as your data is always the first day in the respective month. You have to decide what should happen in the following cases though:
ymd("2019-08-31")+months(1)
ymd("2019-01-29")+months(1)
This leads to an NA for obvious reason. If this happens lubridate::add_with_rollback() could offer a solution, depending on your needs.
An edit after clarifying the question. If you're looking for those dates whose expdate is two months "later" compared to the curdate, in the specific sense that you're comparing only their months regardless of the year, a little modulo operation might help:
df %>%
filter(lubridate::month(expdate) == (lubridate::month(curdate)+2) %% 12)