rdplyrnabizdays

R unexpected behaviour of bizdays::adjust.previous when checking if date is NA


I trying to convert dates inside a dataframe to business days using bizdays package. This dataframe may have some missing values (NA), so I added an ifelse statement to ignore these empty cells, but it seems like it breaks the code and I don't know why.

This is a small example of the error:

library(bizdays)
library(dplyr)

holidays <- c("2022-03-01",
              "2022-03-07",
              "2022-03-08",
              "2022-03-25")

start_date = as.Date("01/01/2010", format = "%d/%m/%Y")
end_date   = as.Date("01/01/2060", format = "%d/%m/%Y")

calendar <- create.calendar("my_cal",
                            holidays =  holidays,
                            weekdays =c("saturday", "sunday"),
                            start.date = start_date,
                            end.date = end_date)

bizdays.options$set(default.calendar="my_cal")


date_1 <- "2022-03-13" # sunday
print(adjust.previous(date_1)) # friday "2022-03-11"

days <- c()
for (i in c(1:31)) {
  days <- c(days, paste("2022-03-", formatC(i, width = 2, flag = '0'), sep = ""))
}

df <- data.frame(days = days)

df_1 <- df %>% mutate(days_1 = adjust.previous(days))

head(df_1) # correct
#        days     days_1
#1 2022-03-01 2022-02-28
#2 2022-03-02 2022-03-02
#3 2022-03-03 2022-03-03
#4 2022-03-04 2022-03-04
#5 2022-03-05 2022-03-04
#6 2022-03-06 2022-03-04

df_2 <- df %>% mutate(days_2 = ifelse(is.na(days),
                                      days,
                                      adjust.previous(days)))

head(df_2) # date is converted to a number
#        days days_2
#1 2022-03-01  19051
#2 2022-03-02  19053
#3 2022-03-03  19054
#4 2022-03-04  19055
#5 2022-03-05  19055
#6 2022-03-06  19055

Solution

  • This is not to do with the bizdays package but rather how ifelse() returns objects of class Date as numeric. See this example:

    class(Sys.Date()) # Date
    ifelse(TRUE, Sys.Date(), Sys.Date()) # 19066
    class(ifelse(TRUE, Sys.Date(), Sys.Date())) # numeric
    

    Conversely:

    if(TRUE) class(Sys.Date()) # Date
    

    In your case, it seems to me that the ifelse() is unnecessary as adjust.previous handles NA values:

    df$days[1] = NA
    df_2 <- df %>% mutate(
        days_2 = adjust.previous(days)
    )
    
    # Seems to work
    head(df_2)
    #         days     days_2
    # 1       <NA>       <NA>
    # 2 2022-03-02 2022-03-02
    # 3 2022-03-03 2022-03-03
    # 4 2022-03-04 2022-03-04
    # 5 2022-03-05 2022-03-04
    # 6 2022-03-06 2022-03-04
    

    However, if this isn't working for your real data, I would just leave dplyr world, which is great but slightly weaker when subsetting columns, and do it in base R:

    df_3  <- df 
    df_3$days_3  <- as.Date(0, origin = "1970-01-01") # Create date column
    df_3$days_3[is.na(df_3$days)]  <- NA # Fill NA
    df_3$days_3[!is.na(df_3$days)]  <- adjust.previous(df_3$days[!is.na(df_3$days)]) # Fill values
    
    # Output as above
    head(df_3)
    #         days     days_3
    # 1       <NA>       <NA>
    # 2 2022-03-02 2022-03-02
    # 3 2022-03-03 2022-03-03
    # 4 2022-03-04 2022-03-04
    # 5 2022-03-05 2022-03-04
    # 6 2022-03-06 2022-03-04