I am trying to implement a sum of consecutive values that fall with 365 days of each other, grouped by a unique identifier in R. For example, for date 1 of a particular ID, we would add dates 2,3,4 (fall within 365 days) of the same ID to get a total cost for date 1. Then for date 2, we would add 3 and 4 to get that total cost and so on. I have tried several rolling sums (R dplyr rolling sum) and similar solutions from dplyr that take sum consecutive values (Calculate sum of a column if the difference between consecutive rows meets a condition) with certain constraints but could not get the code to distinguish the amount of days. I have included a sample dataset with a solution dataset for an example that I am looking for.
Starting Data Set
ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,5000,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df2<-data.frame(ID,admitdt,cost,cost365)
ID admitdt cost
1 1 2014-10-19 2000
2 1 2014-10-24 14077
3 1 2015-01-31 5000
4 1 2016-01-20 200
5 1 2017-06-30 560
6 1 2017-07-17 5000
7 2 2015-04-21 888
8 2 2015-04-22 5959
9 2 2015-05-04 1819
10 2 2015-07-25 7508
11 3 2014-11-11 6406
Solution:
ID <- c(1,1,1,1,1,1,2,2,2,2,3)
admitdt <-c("2014-10-19","2014-10-24","2015-01-31","2016-01-20","2017-06-30","2017-07-17","2015-04-21","2015-04-22","2015-05-04","2015-07-25","2014-11-11")
cost<-c(2000,14077,5000,200,560,500,888,5959,1819,7508,6406)
cost365<-c(21077,19077,5200,200,5560,5000,16174,15286,9327,7508,6406)
df2<-data.frame(ID,admitdt,cost,cost365)
ID admitdt cost cost365
1 1 2014-10-19 2000 21077
2 1 2014-10-24 14077 19077
3 1 2015-01-31 5000 5200
4 1 2016-01-20 200 200
5 1 2017-06-30 560 5560
6 1 2017-07-17 5000 5000
7 2 2015-04-21 888 16174
8 2 2015-04-22 5959 15286
9 2 2015-05-04 1819 9327
10 2 2015-07-25 7508 7508
11 3 2014-11-11 6406 6406
Here's an approach with purrr::map
:
library(dplyr); library(purrr)
df2 %>%
mutate(admitdt = as.Date(admitdt)) %>%
group_by(ID) %>%
mutate(cost365 = map_dbl(admitdt,~sum(cost[(.x - admitdt) <= 0 &
(.x - admitdt) >= -365])))
# A tibble: 11 x 4
# Groups: ID [3]
ID admitdt cost cost365
<dbl> <date> <dbl> <dbl>
1 1 2014-10-19 2000 21077
2 1 2014-10-24 14077 19077
3 1 2015-01-31 5000 5200
4 1 2016-01-20 200 200
5 1 2017-06-30 560 1060
6 1 2017-07-17 500 500
7 2 2015-04-21 888 16174
8 2 2015-04-22 5959 15286
9 2 2015-05-04 1819 9327
10 2 2015-07-25 7508 7508
11 3 2014-11-11 6406 6406