I have a df
in R
with multiple events
per individual. Each event
is currently defined by a unique date
. The column
diff_earliest_date
is the number
of days
between the earliest date
(by unique id
) and the subsequent date
.
data <- data.frame(
id = c(1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 1234, 5678, 5678, 5678, 5678, 5678, 5678),
date = as.Date(c("1997-04-08", "1997-04-12", "1997-04-19", "1997-09-01", "1997-09-14", "1997-10-22", "1997-12-10",
"1998-06-16", "1998-06-27", "1998-06-29", "1998-09-13", "1998-09-25", "1999-05-17", "1999-07-18",
"1997-04-08", "1997-04-22", "1997-09-14", "1997-10-25", "1998-05-13", "1998-10-07")),
event = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6),
diff_earliest_date = c("0 days", "4 days", "11 days", "146 days", "159 days", "197 days", "246 days", "434 days",
"445 days", "447 days", "523 days", "535 days", "769 days", "831 days", "0 days", "10 days",
"159 days", "200 days", "400 days", "547 days"))
> data
id date event diff_earliest_date
1 1234 1997-04-08 1 0 days
2 1234 1997-04-12 2 4 days
3 1234 1997-04-19 3 11 days
4 1234 1997-09-01 4 146 days
5 1234 1997-09-14 5 159 days
6 1234 1997-10-22 6 197 days
7 1234 1997-12-10 7 246 days
8 1234 1998-06-16 8 434 days
9 1234 1998-06-27 9 445 days
10 1234 1998-06-29 10 447 days
11 1234 1998-09-13 11 523 days
12 1234 1998-09-25 12 535 days
13 1234 1999-05-17 13 769 days
14 1234 1999-07-18 14 831 days
15 5678 1997-04-08 1 0 days
16 5678 1997-04-22 2 10 days
17 5678 1997-09-14 3 159 days
18 5678 1997-10-25 4 200 days
19 5678 1998-05-13 5 400 days
20 5678 1998-10-07 6 547 days
For each id
, I want to define unique events
separated by 243 days apart. When the interval
between the earliest date
(by id
) and each subsequent date
is > 243, it will reset
the difference in days
for a new unique event
. The data
with the new columns
(diff_reset
and unique_event
) should look like this:
> data
id date event diff_earliest_date diff_reset unique_event
1 1234 1997-04-08 1 0 days 0 days 1
2 1234 1997-04-12 2 4 days 4 days 1
3 1234 1997-04-19 3 11 days 11 days 1
4 1234 1997-09-01 4 146 days 146 days 1
5 1234 1997-09-14 5 159 days 159 days 1
6 1234 1997-10-22 6 197 days 197 days 1
7 1234 1997-12-10 7 246 days 0 days 2
8 1234 1998-06-16 8 434 days 189 days 2
9 1234 1998-06-27 9 445 days 200 days 2
10 1234 1998-06-29 10 447 days 202 days 2
11 1234 1998-09-13 11 523 days 0 days 3
12 1234 1998-09-25 12 535 days 13 days 3
13 1234 1999-05-17 13 769 days 0 days 4
14 1234 1999-07-18 14 831 days 63 days 4
15 5678 1997-04-08 1 0 days 0 days 1
16 5678 1997-04-22 2 14 days 14 days 1
17 5678 1997-09-14 3 159 days 159 days 1
18 5678 1997-10-25 4 200 days 0 days 1
19 5678 1998-05-13 5 400 days 200 days 2
20 5678 1998-10-07 6 547 days 0 days 3
I think I can only achieve that by running a loop
through my data
. I have tried mutate
using date operations
and lag(date)
using dplyr
with no success so far. Very frustrating! I would very much appreciate your help.
Thanks.
You can use the integer division operator %/%
to get the number of 243 "chunks" away from the start that you are. For example
data %>%
group_by(id) %>%
mutate(unique_event = as.numeric(difftime(date, min(date), units="days"))%/%243+1)
Which gives
id date event diff_earliest_date unique_event
<dbl> <date> <dbl> <chr> <dbl>
1 1234 1997-04-08 1 0 days 1
2 1234 1997-04-12 2 4 days 1
3 1234 1997-04-19 3 11 days 1
4 1234 1997-09-01 4 146 days 1
5 1234 1997-09-14 5 159 days 1
6 1234 1997-10-22 6 197 days 1
7 1234 1997-12-10 7 246 days 2
8 1234 1998-06-16 8 434 days 2
9 1234 1998-06-27 9 445 days 2
10 1234 1998-06-29 10 447 days 2
11 1234 1998-09-13 12 523 days 3
12 1234 1998-09-25 12 535 days 3
13 1234 1999-05-17 13 769 days 4
14 1234 1999-07-18 14 831 days 4
15 5678 1997-04-08 1 0 days 1
16 5678 1997-04-22 2 10 days 1
17 5678 1997-09-14 3 159 days 1
18 5678 1997-10-25 4 200 days 1
19 5678 1998-05-13 5 400 days 2
20 5678 1998-10-07 6 547 days 3