I want to create a new column with a running sum of every value greater than 0. I have a dataframe:
df=data.frame(year=c('2007-04-01','2007-04-02','2007-04-03','2007-04-04','2007-04-05','2007-04-06'),air.temp=c(1,2,-1,3,1,0)
and I want to create:
df=data.frame(year=c('2007-04-01','2007-04-02','2007-04-03','2007-04-04','2007-04-05','2007-04-06'),air.temp=c(1,2,-1,3,1,0),temp.sum=c(1,3,3,6,7,7))
So far I have tried:
df$temp.sum <- if_else(df$air.temp > 0, cumsum(df$air.temp), 0)
Which resulted in
temp.sum=c(1,3,0,5,6,0))
How do I not count values at or below 0, without changing the running sum? My dataset is 100,000+ observations, so simple suggestions are helpful!
Use a parallel maximum to make negative values 0, then continue to do the cumulative sum.
cumsum(pmax(df$air.temp, 0))
#[1] 1 3 3 6 7 7
Seems very quick on 1.2M values:
x <- rep(df$air.temp, 2e5)
system.time(cumsum(pmax(x, 0)))
## user system elapsed
## 0 0 0