rdatetimethreshold

Is there an R function for applying a threshold?


I have a dataset that looks like this:

start_date end_date
2021-11-28 05:00:00 2022-06-29 04:00:00
2021-09-03 04:00:00 2022-12-04 05:00:00
2021-02-22 05:00:00 2021-03-16 04:00:00
2022-07-18 04:00:00 2022-12-19 04:00:00
2020-01-06 05:00:00 2020-07-05 04:00:00
2021-09-18 04:00:00 2022-03-18 04:00:00
2020-07-02 04:00:00 2020-08-30 04:00:00
2021-03-30 04:00:00 2021-04-27 04:00:00
2021-05-31 04:00:00 2021-11-30 05:00:00
2021-08-05 04:00:00 2022-02-03 05:00:00

I make another column showing the number of days in this “approved” date range. (Rounded, & as numeric so I can apply other calculations to it)

dat1$days_approved <- round(as.numeric(difftime(dat1$end_date,dat1$start_date,units=c("days"))),digits = 0)

Now, I want to see where I am based on today’s date regarding these time periods. That is, are we 1/2way through, have not started, or are we complete?

So, I use the tzone function for “today” and apply some basic division.

dat1$time_progress <- (round(as.numeric(now(tzone = "")-dat1$start_date,units=c("days"))))/dat1$days_approved

That leaves me with a dataset looking like this:

start_date end_date days_approved time_progress
2021-11-28 05:00:00 2022-06-29 04:00:00 213 1.01
2021-09-03 04:00:00 2022-12-04 05:00:00 457 0.661
2021-02-22 05:00:00 2021-03-16 04:00:00 22 22.5
2022-07-18 04:00:00 2022-12-19 04:00:00 154 -0.104
2020-01-06 05:00:00 2020-07-05 04:00:00 181 5.02
2021-09-18 04:00:00 2022-03-18 04:00:00 181 1.59
2020-07-02 04:00:00 2020-08-30 04:00:00 59 12.4
2021-03-30 04:00:00 2021-04-27 04:00:00 28 16.4
2021-05-31 04:00:00 2021-11-30 05:00:00 183 2.17
2021-08-05 04:00:00 2022-02-03 05:00:00 182 1.82

This makes me think I need to set a threshold, if the value is greater than 1, I’d like it to return 1. If it is less than 1, I’d like to return the value.

I can make this work with an if else statement…

ifelse(dat1$time_progress > 1, 1, dat1$time_progress)

However, I’m struggling to apply it as logic to the column. Is there an existing function that can apply a threshold I have not found?


Solution

  • We could create our own treshold function and then apply it to the desired column:

    library(dplyr)
    library(lubridate)
    
    my_treshold_function <- function(x){
      ifelse(x >1, 1, x)
    }
    
    df %>% 
      mutate(across(ends_with("date"), ymd_hms),
             days_approved = round(as.numeric(end_date-start_date), 0),
             progress = round(as.numeric(now(tzone = "")-start_date))/days_approved,
             across(progress, ~my_treshold_function(.), .names="treshold"))
    
    
       start_date          end_date            days_approved progress treshold
       <dttm>              <dttm>                      <dbl>    <dbl>    <dbl>
     1 2021-11-28 05:00:00 2022-05-29 04:00:00           182     1.18        1
     2 2021-09-03 04:00:00 2022-03-04 05:00:00           182     1.65        1
     3 2021-02-22 05:00:00 2021-03-16 04:00:00            22    22.5         1
     4 2020-09-18 04:00:00 2021-03-19 04:00:00           182     3.58        1
     5 2020-01-06 05:00:00 2020-07-05 04:00:00           181     5.01        1
     6 2021-09-18 04:00:00 2022-03-18 04:00:00           181     1.58        1
     7 2020-07-02 04:00:00 2020-08-30 04:00:00            59    12.4         1
     8 2021-03-30 04:00:00 2021-04-27 04:00:00            28    16.4         1
     9 2021-05-31 04:00:00 2021-11-30 05:00:00           183     2.16        1
    10 2021-08-05 04:00:00 2022-02-03 05:00:00           182     1.81        1
    

    data:

    structure(list(start_date = c("2021-11-28 5:00:00", "2021-09-03 4:00:00", 
    "2021-02-22 5:00:00", "2020-09-18 4:00:00", "2020-01-06 5:00:00", 
    "2021-09-18 4:00:00", "2020-07-02 4:00:00", "2021-03-30 4:00:00", 
    "2021-05-31 4:00:00", "2021-08-05 4:00:00"), end_date = c("2022-05-29 4:00:00", 
    "2022-03-04 5:00:00", "2021-03-16 4:00:00", "2021-03-19 4:00:00", 
    "2020-07-05 4:00:00", "2022-03-18 4:00:00", "2020-08-30 4:00:00", 
    "2021-04-27 4:00:00", "2021-11-30 5:00:00", "2022-02-03 5:00:00"
    )), class = "data.frame", row.names = c(NA, -10L))