I've got a df and i need to perform a rolling sum in another dataframe based on the values of the first df
Date | Company | Account | Value |
---|---|---|---|
2021-01-01 | A | a | 10 |
2021-01-01 | A | b | 10 |
2021-01-01 | A | b | 5 |
2021-01-02 | A | a | 12 |
2021-01-02 | A | a | 4 |
2021-01-02 | A | b | 4 |
Date | Company | Account | Value |
---|---|---|---|
2021-01-01 | A | a | 10 |
2021-01-01 | A | b | 15 |
2021-01-02 | A | a | 26 |
2021-01-02 | A | b | 14 |
2021-01-01 | B | x | i |
2021-01-01 | B | y | i |
2021-01-02 | B | x | ii |
2021-01-02 | B | y | ii |
In Excel this would be something like a SUMIF where you state that the criteria must me Company & Account % Date<=Date
Thnks in advance
using data.table
library(data.table)
setDT(df)
df[, .(Value = sum(Value)), by = .(Date, Company, Account)][, .(Date, Value = cumsum(Value)), by = .(Company, Account)]
results
# Company Account Date Value
# 1: A a 2021-01-01 10
# 2: A a 2021-01-02 26
# 3: A b 2021-01-01 15
# 4: A b 2021-01-02 19
data
df <- structure(list(Date = structure(c(18628L, 18628L, 18628L, 18629L,
18629L, 18629L), class = c("IDate", "Date")), Company = c("A",
"A", "A", "A", "A", "A"), Account = c("a", "b", "b", "a", "a",
"b"), Value = c(10L, 10L, 5L, 12L, 4L, 4L)), row.names = c(NA,
-6L), class = c("data.frame"))
EDIT update on OP's extra requirement in the comments
# same code as above
df <- df[, .(Value = sum(Value)), by = .(Date, Company, Account)][, .(Date, Value = cumsum(Value)), by = .(Company, Account)]
# now create a small table with the full range of dates per group
fullrange <- df[, .(Date = seq(min(Date), max(Date), "days")), .(Company, Account)]
# merge the two tables, missing dates will here get NA as value
output <- merge(df, fullrange, by = c("Company", "Account", "Date"), all = T)
# now replace the NA values with the previous value
setnafill(output, type = "locf", cols = "Value")
output
# Company Account Date Value
# 1: A a 2021-01-01 10
# 2: A a 2021-01-02 10
# 3: A a 2021-01-03 26
# 4: A b 2021-01-01 15
# 5: A b 2021-01-02 15
# 6: A b 2021-01-03 19
# changed the data a bit and replaced the 2021-01-02 for 2021-01-03
df <- structure(list(Date = structure(c(18628L, 18628L, 18628L, 18630L,
18630L, 18630L), class = "Date"), Company = c("A", "A", "A",
"A", "A", "A"), Account = c("a", "b", "b", "a", "a", "b"), Value = c(10L,
10L, 5L, 12L, 4L, 4L)), row.names = c(NA, -6L), class = c("data.frame"))