rcolumn-sum

Stack Based Column Sum in a data frame using R


My existing data frame looks line given below

NAV_Date    NAV    Year  Day       Units   Amount  Balance_Units
2013-06-01  282.5  2013  Saturday  3.540   1000    3.540
2013-06-08  279.3  2013  Saturday  3.581   1000    3.581
2013-06-15  276.0  2013  Saturday  3.623   1000    3.623
2013-06-22  261.6  2013  Saturday  3.822   1000    3.822
2013-06-29  273.3  2013  Saturday  3.659   1000    3.659

I want my new data frame to have Balance_Units column have entries like given below using R Language i.e Balance unit column should be the sum of previous value and present value This is need to be done on the list of dataframe

NAV_Date    NAV    Year  Day       Units   Amount  Balance_Units
2013-06-01  282.5  2013  Saturday  3.540   1000    3.540
2013-06-08  279.3  2013  Saturday  3.581   1000    7.121
2013-06-15  276.0  2013  Saturday  3.623   1000    10.744
2013-06-22  261.6  2013  Saturday  3.822   1000    14.566
2013-06-29  273.3  2013  Saturday  3.659   1000    18.225

I tried this but , this is not working

 for( i in 1:length(W)) {
  W[[i]]$Units  = 1000/W[[i]]$NAV
  W[[i]]$Amount = 1000
  W[[i]]$Balance_Units = 0
  W[[i]]$Balance_Units = W[[i]]$Units + W[[i]]$Balance_Units
   }

Solution

  • This can be done with a convenient function cumsum in base R

    df1$Balance_Units <- cumsum(df1$Balance_Units)
    

    With dplyr, it can be created within mutate

    library(dplyr)
    df1 %>%
        mutate(Balance_Units = cumsum(Balance_Units))
    

    If 'W' is a list of data.frames, we can use lapply

    W <- lapply(W, transform, Balance_Units = cumsum(Balance_Units))
    

    data

    df1 <- structure(list(NAV_Date = c("2013-06-01", "2013-06-08", "2013-06-15", 
    "2013-06-22", "2013-06-29"), NAV = c(282.5, 279.3, 276, 261.6, 
    273.3), Year = c(2013L, 2013L, 2013L, 2013L, 2013L), Day = c("Saturday", 
    "Saturday", "Saturday", "Saturday", "Saturday"), Units = c(3.54, 
    3.581, 3.623, 3.822, 3.659), Amount = c(1000L, 1000L, 1000L, 
    1000L, 1000L), Balance_Units = c(3.54, 3.581, 3.623, 3.822, 3.659
    )), class = "data.frame", row.names = c(NA, -5L))