rdatedatetimeconditional-statementsfiltering

Selecting a value based on a datetime reference


I'm currently working on two data sets one contains the records of repairs of machines in a textile industry and the other data set contains the record of the preventive maintenance that a machine received.

MACHINE REPAIR_DATE
X 6/03/2021
Y 24/02/2021
F 28/02/2021
O 27/02/2021
O 18/02/2021
O 12/02/2021
U 7/02/2021
U 8/01/2021
U 2/01/2021
H 1/02/2021
H 4/02/2021
H 12/03/2021
H 22/03/2021

The second database looks like this

MACHINE MAINTENANCE_DATE
X 4/03/2021
F 29/02/2021
M 26/02/2021
L 25/02/2021
U 16/02/2021
H 10/02/2021
H 5/02/2021
H 6/01/2021

What I am trying to achieve is creating a dataframe where I can see for each machine repair's date when was the very last maintenance date prior to that specific repair (also hopefully using tidyverse since I'm learning but if this can't be solved using tidyverse that's not a problem).

For instance:

MACHINE REPAIR_DATE LAST_MAINTENANCE_DATE_PRIOR_TO_REPAIR
F       28/02/2021           N.A

This is because we have a record of maintenance for machine F that has performed AFTER the repair date, and we are looking for the latest maintenance date before the repair date.

What am I doing?

I thought about arranging both datasets in chronological order to later on joining them then grouping by MACHINE and REPAIR_DATE but I am struggling with conditioning that the latest maintenance date should be prior to the date of repair, and I have to construct a data frame with all machines repair dates and their respective latest maintenance date. I have also some confusion when it comes to joining the table since there will be some N.As if I do a full join by a "dummy variable" or even if a just join the two datasets lets say by doing something like all<-c(dataset1,dataset2).

Code:

ds1<-tribble(
  ~MACHINE,  ~REPAIR_DATE,
  "X",  "6/03/2021",
  "Y",  "24/02/2021",
  "F",  "28/02/2021",
  "O",  "27/02/2021",
  "O",  "18/02/2021",
  "O",  "12/02/2021",
  "U",  "7/02/2021",
  "U",  "8/01/2021",
  "U",  "2/01/2021",
  "H",  "1/02/2021",
  "H",  "4/02/2021",
  "H",  "12/03/2021",
  "H",  "22/03/2021")

ds2<-tribble(
  ~MACHINE,  ~MAINTENANCE_DATE,
  "X",  "4/03/2021",
  "F",  "29/02/2021",
  "M",  "26/02/2021",
  "L",  "25/02/2021",
  "U",  "16/02/2021",
  "H",  "10/02/2021",
  "H",  "5/02/2021",
  "H",  "6/01/2021")

Solution

  • You can try the following :

    library(dplyr)
    library(lubridate)
    
    ds1 %>%
      mutate(REPAIR_DATE = dmy(REPAIR_DATE)) %>%
      left_join(ds2 %>% 
                  mutate(MAINTENANCE_DATE = dmy(MAINTENANCE_DATE)), 
                by = 'MACHINE') %>% 
      group_by(MACHINE, REPAIR_DATE) %>%
      summarise(LAST_MAINTENANCE_DATE = {
          val <- REPAIR_DATE - MAINTENANCE_DATE
          if(any(val > 0, na.rm = TRUE)) 
            max(MAINTENANCE_DATE[val > 0], na.rm = TRUE)
          else NA
        }
        ) %>%
      ungroup
    
    #  MACHINE REPAIR_DATE LAST_MAINTENANCE_DATE
    #   <chr>   <date>      <date>               
    # 1 F       2021-02-28  NA                   
    # 2 H       2021-02-01  2021-01-06           
    # 3 H       2021-02-04  2021-01-06           
    # 4 H       2021-03-12  2021-02-10           
    # 5 H       2021-03-22  2021-02-10           
    # 6 O       2021-02-12  NA                   
    # 7 O       2021-02-18  NA                   
    # 8 O       2021-02-27  NA                   
    # 9 U       2021-01-02  NA                   
    #10 U       2021-01-08  NA                   
    #11 U       2021-02-07  NA                   
    #12 X       2021-03-06  2021-03-04           
    #13 Y       2021-02-24  NA                   
    

    Join the two datasets and for each unique REPAIR_DATE of MACHINE find the last MAINTENANCE_DATE.