rif-statementcumsum

Cumulative sum in R if value greater than 0


I want to create a new column with a running sum of every value greater than 0. I have a dataframe:

df=data.frame(year=c('2007-04-01','2007-04-02','2007-04-03','2007-04-04','2007-04-05','2007-04-06'),air.temp=c(1,2,-1,3,1,0)

and I want to create:

df=data.frame(year=c('2007-04-01','2007-04-02','2007-04-03','2007-04-04','2007-04-05','2007-04-06'),air.temp=c(1,2,-1,3,1,0),temp.sum=c(1,3,3,6,7,7)) 

So far I have tried:

df$temp.sum <- if_else(df$air.temp > 0, cumsum(df$air.temp), 0)

Which resulted in

temp.sum=c(1,3,0,5,6,0))  

How do I not count values at or below 0, without changing the running sum? My dataset is 100,000+ observations, so simple suggestions are helpful!


Solution

  • Use a parallel maximum to make negative values 0, then continue to do the cumulative sum.

    cumsum(pmax(df$air.temp, 0))
    #[1] 1 3 3 6 7 7
    

    Seems very quick on 1.2M values:

    x <- rep(df$air.temp, 2e5)
    system.time(cumsum(pmax(x, 0)))
    ##   user  system elapsed 
    ##      0       0       0