rdataframerepeatrolling-sum

Rolling Sum by 3 Conditions in R based on another data frame


I've got a df and i need to perform a rolling sum in another dataframe based on the values of the first df

  1. All the transactions of each company, by date & account
  2. This is the dataframe I want to build with the balance at the end of the day with each company's account
Date Company Account Value
2021-01-01 A a 10
2021-01-01 A b 10
2021-01-01 A b 5
2021-01-02 A a 12
2021-01-02 A a 4
2021-01-02 A b 4
Date Company Account Value
2021-01-01 A a 10
2021-01-01 A b 15
2021-01-02 A a 26
2021-01-02 A b 14
2021-01-01 B x i
2021-01-01 B y i
2021-01-02 B x ii
2021-01-02 B y ii

In Excel this would be something like a SUMIF where you state that the criteria must me Company & Account % Date<=Date

Thnks in advance


Solution

  • using data.table

    library(data.table)
    
    setDT(df)
    
    df[, .(Value = sum(Value)), by = .(Date, Company, Account)][, .(Date, Value = cumsum(Value)), by = .(Company, Account)]
    

    results

    #    Company Account       Date Value
    # 1:       A       a 2021-01-01    10
    # 2:       A       a 2021-01-02    26
    # 3:       A       b 2021-01-01    15
    # 4:       A       b 2021-01-02    19
    

    data

    df <- structure(list(Date = structure(c(18628L, 18628L, 18628L, 18629L, 
    18629L, 18629L), class = c("IDate", "Date")), Company = c("A", 
    "A", "A", "A", "A", "A"), Account = c("a", "b", "b", "a", "a", 
    "b"), Value = c(10L, 10L, 5L, 12L, 4L, 4L)), row.names = c(NA, 
    -6L), class = c("data.frame"))
    

    EDIT update on OP's extra requirement in the comments

    # same code as above
    df <- df[, .(Value = sum(Value)), by = .(Date, Company, Account)][, .(Date, Value = cumsum(Value)), by = .(Company, Account)]
    
    # now create a small table with the full range of dates per group
    fullrange <- df[, .(Date = seq(min(Date), max(Date), "days")), .(Company, Account)]
    
    # merge the two tables, missing dates will here get NA as value
    output <- merge(df, fullrange, by = c("Company", "Account", "Date"), all = T)
    
    # now replace the NA values with the previous value
    setnafill(output, type = "locf", cols = "Value")
    
    output
    
    #    Company Account       Date Value
    # 1:       A       a 2021-01-01    10
    # 2:       A       a 2021-01-02    10
    # 3:       A       a 2021-01-03    26
    # 4:       A       b 2021-01-01    15
    # 5:       A       b 2021-01-02    15
    # 6:       A       b 2021-01-03    19
    
    # changed the data a bit and replaced the 2021-01-02 for 2021-01-03
    df <- structure(list(Date = structure(c(18628L, 18628L, 18628L, 18630L, 
    18630L, 18630L), class = "Date"), Company = c("A", "A", "A", 
    "A", "A", "A"), Account = c("a", "b", "b", "a", "a", "b"), Value = c(10L, 
    10L, 5L, 12L, 4L, 4L)), row.names = c(NA, -6L), class = c("data.frame"))