rdata.tablelagdifftime

How to get time difference in days since last date (lag) in R using datatable?


person_id  diag_date concept_id  event diff_prev_event
1:         1 2012-01-15    4265600 comorb         NA secs
2:         1 2012-01-15     201820 comorb          0 secs
3:         1 2012-03-15    4265600 comorb    5184000 secs
4:         2 2012-03-15     201820 comorb         NA secs
5:         2 2012-06-22     201820 comorb    8553600 secs
6:         2 2012-06-22    4265600 comorb          0 secs

I am trying to calculate the days since last event for each person. I am running into two issues.

  1. The time difference is showing in seconds. I need to get the days. (5184000 secs = 30 days)
  2. If two days are same dates then the 2nd one is showing 0 when it should be looking at the different date. Row 5 and 6 is same date so they would have the same date difference.

This is the code I tried:

dt[order(diag_date),diff_prev_event := difftime(diag_date, lag( diag_date)), by = c("person_id") ]

Solution

  • Specify the units

    library(data.table)
    dt[order(diag_date),diff_prev_event := difftime(diag_date, 
         lag( diag_date), units = 'days'), by = c("person_id") ]
    

    Then, we grouped by 'person_id' and 'diag_date' and change the values to the max if there are more than one row

    dt[, diff_prev_event := if(.N > 1) max(diff_prev_event, 
        na.rm = TRUE) else diff_prev_event, .(person_id, diag_date)]
    > dt
       person_id  diag_date concept_id  event diff_prev_event
           <int>     <Date>      <int> <char>      <difftime>
    1:         1 2012-01-15    4265600 comorb          0 days
    2:         1 2012-01-15     201820 comorb          0 days
    3:         1 2012-03-15    4265600 comorb         60 days
    4:         2 2012-03-15     201820 comorb         NA days
    5:         2 2012-06-22     201820 comorb         99 days
    6:         2 2012-06-22    4265600 comorb         99 days
    

    -output

    data

    dt <- structure(list(person_id = c(1L, 1L, 1L, 2L, 2L, 2L), diag_date = structure(c(15354, 
    15354, 15414, 15414, 15513, 15513), class = "Date"), concept_id = c(4265600L, 
    201820L, 4265600L, 201820L, 201820L, 4265600L), event = c("comorb", 
    "comorb", "comorb", "comorb", "comorb", "comorb")), row.names = c(NA, 
    -6L), class = c("data.table", "data.frame"))