rprecision

efficient cumulative log_sum_exp


I have a dataframe with 1M rows in which one column represents the log posterior probability. Many of these values are very close to 0 when exponentiated, so I have to use log_sum_exp() rather than log(sum(exp())) to obtain accurate results. However, I would like to obtain the cumulative probability density implied by these posterior probabilities. If simple summing would work, I could use log(cumsum(exp))) but for reasons specified above, this does not give accurate results. Currently I am running a for-loop:

for(k in 1:nrow(y)){
  y$cumlogsum[k] <- log_sum_exp(y$log_pp[1:k])
}

This is obviously hopelessly inefficient and takes hours to run. Is there an efficient way to do this in R, e.g. by vectorizing?


Solution

  • The log-sum-exp trick is:

    log_sum_exp <- function(x) {
      max.x <- max(x)
      return(max.x + log(sum(exp(x - max.x))))
    }
    

    And a cumsum version would simply be:

    log_cumsum_exp <- function(x) {
      max.x <- max(x)
      return(max.x + log(cumsum(exp(x - max.x))))
    }