rloopsdateuniquedate-difference

Loop through unique id using difference between dates and time reset in R to obtain unique events


I have a df in R with multiple events per individual. Each event is currently defined by a unique date. The column diff_earliest_date is the number of days between the earliest date (by unique id) and the subsequent date.

data <- data.frame(
  id = c(1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 5678, 5678, 5678, 5678, 5678, 5678),
  date = as.Date(c("1997-04-08", "1997-04-12", "1997-04-19", "1997-09-01", "1997-09-14", "1997-10-22", "1997-12-10", 
                   "1998-06-16", "1998-06-27", "1998-06-29", "1998-09-13", "1998-09-25", "1999-05-17", "1999-07-18", 
                   "1997-04-08", "1997-04-22", "1997-09-14", "1997-10-25", "1998-05-13", "1998-10-07")),
  event = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6),
  diff_earliest_date = c("0 days", "4 days", "11 days", "146 days", "159 days", "197 days", "246 days", "434 days", 
                         "445 days", "447 days", "523 days", "535 days", "769 days", "831 days", "0 days", "10 days", 
                         "159 days", "200 days", "400 days", "547 days"))

  > data
     id       date event diff_earliest_date
1  1234 1997-04-08     1             0 days
2  1234 1997-04-12     2             4 days
3  1234 1997-04-19     3            11 days
4  1234 1997-09-01     4           146 days
5  1234 1997-09-14     5           159 days
6  1234 1997-10-22     6           197 days
7  1234 1997-12-10     7           246 days
8  1234 1998-06-16     8           434 days
9  1234 1998-06-27     9           445 days
10 1234 1998-06-29    10           447 days
11 1234 1998-09-13    11           523 days
12 1234 1998-09-25    12           535 days
13 1234 1999-05-17    13           769 days
14 1234 1999-07-18    14           831 days
15 5678 1997-04-08     1             0 days
16 5678 1997-04-22     2            10 days
17 5678 1997-09-14     3           159 days
18 5678 1997-10-25     4           200 days
19 5678 1998-05-13     5           400 days
20 5678 1998-10-07     6           547 days

For each id, I want to define unique events separated by 243 days apart. When the interval between the earliest date (by id) and each subsequent date is > 243, it will reset the difference in days for a new unique event. The data with the new columns (diff_reset and unique_event) should look like this:

> data
     id       date event diff_earliest_date diff_reset unique_event
1  1234 1997-04-08     1             0 days     0 days            1
2  1234 1997-04-12     2             4 days     4 days            1
3  1234 1997-04-19     3            11 days    11 days            1
4  1234 1997-09-01     4           146 days   146 days            1
5  1234 1997-09-14     5           159 days   159 days            1
6  1234 1997-10-22     6           197 days   197 days            1
7  1234 1997-12-10     7           246 days     0 days            2
8  1234 1998-06-16     8           434 days   189 days            2
9  1234 1998-06-27     9           445 days   200 days            2
10 1234 1998-06-29    10           447 days   202 days            2
11 1234 1998-09-13    11           523 days     0 days            3
12 1234 1998-09-25    12           535 days    13 days            3
13 1234 1999-05-17    13           769 days     0 days            4
14 1234 1999-07-18    14           831 days    63 days            4
15 5678 1997-04-08     1             0 days     0 days            1
16 5678 1997-04-22     2            14 days    14 days            1
17 5678 1997-09-14     3           159 days   159 days            1
18 5678 1997-10-25     4           200 days     0 days            1
19 5678 1998-05-13     5           400 days   200 days            2
20 5678 1998-10-07     6           547 days     0 days            3

I think I can only achieve that by running a loop through my data. I have tried mutate using date operations and lag(date) using dplyr with no success so far. Very frustrating! I would very much appreciate your help.

Thanks.


Solution

  • You can use the integer division operator %/% to get the number of 243 "chunks" away from the start that you are. For example

    data %>% 
      group_by(id) %>% 
      mutate(unique_event = as.numeric(difftime(date, min(date), units="days"))%/%243+1)
    

    Which gives

          id date       event diff_earliest_date unique_event
       <dbl> <date>     <dbl> <chr>                     <dbl>
     1  1234 1997-04-08     1 0 days                        1
     2  1234 1997-04-12     2 4 days                        1
     3  1234 1997-04-19     3 11 days                       1
     4  1234 1997-09-01     4 146 days                      1
     5  1234 1997-09-14     5 159 days                      1
     6  1234 1997-10-22     6 197 days                      1
     7  1234 1997-12-10     7 246 days                      2
     8  1234 1998-06-16     8 434 days                      2
     9  1234 1998-06-27     9 445 days                      2
    10  1234 1998-06-29    10 447 days                      2
    11  1234 1998-09-13    12 523 days                      3
    12  1234 1998-09-25    12 535 days                      3
    13  1234 1999-05-17    13 769 days                      4
    14  1234 1999-07-18    14 831 days                      4
    15  5678 1997-04-08     1 0 days                        1
    16  5678 1997-04-22     2 10 days                       1
    17  5678 1997-09-14     3 159 days                      1
    18  5678 1997-10-25     4 200 days                      1
    19  5678 1998-05-13     5 400 days                      2
    20  5678 1998-10-07     6 547 days                      3