rfor-loopapplyweighted-average

Replacing for-loops with apply to improve performance (with weighted.mean)


I am an R newbie, so hopefully this is a solvable problem for some of you. I have a dataframe containing more than a million data points. My goal is to compute a weighted mean with an altering starting point.

To illustrate consider this frame ( data.frame(matrix(c(1,2,3,2,2,1),3,2)) )

  X1 X2
1  1  2
2  2  2
3  3  1

where X1 is the data and X2 is the sampling weight.

I want to compute the weighted mean for X1 from starting point 1 to 3, from 2:3, and from 3:3.

With a loop, I simply wrote:

B <- rep(NA,3) #empty result vector
for(i in 1:3){
  B[i] <- weighted.mean(x=A$X1[i:3],w=A$X2[i:3]) #shifting the starting point of the data and weights further to the end
} 

With my real data, this is impossible to compute because for each iteration, the data.frame is altered, and the computing takes hours with no result.

Is there a way to implement a varying starting point with an apply command, so that the performance increases?


Solution

  • Building upon @joran's answer to produce the correct result:

    with(A, rev(cumsum(rev(X1*X2)) / cumsum(rev(X2))))
    # [1] 1.800000 2.333333 3.000000
    

    Also note that this is much faster than the sapply/lapply approach.