How to make looping to be the fastest in R?

I am doing a simulation study. I am taking Exponential Distribution as an example.

A short introduction here... given that F(x) = 1 - e^(-λx), so x = (-ln[1-F(x)])/(λ). In R, I will write it as x = (-log(1-p))/(lambda) where F(x) is p. Same goes to its log-likelihood, L_X = n_x*log(lambda) - lambda*sum(x)

Because I am planning to run the iterations up to 1 million times, I am seeking for a faster version.

(The loop approach takes almost 10 hours, while the parallel version takes around 3.4 hours. I definitely need to improve the speed and efficiency.)

My objective is to collect all the value of "simulated lambda"

(1) Base Loop:

iterations <- 100
lambda = 35 #par[1]

simulation <- function(i) {
  
  p <- runif(1000, 0, 1)
  x <- (-log(1-p))/(lambda)
  n_x <- length(x)
  
  loglikelihood = function(par, x, y){
    return( -(n_x*log(par[1]) - par[1]*sum(x)))
  }
  
  result <- optim(c(10), fn = loglikelihood, x = x , method = "Brent",
                  lower = 0, upper = 100)
  
  return(result$par)
}

results_list <- lapply(1:iterations, simulation)                          
results <- do.call(rbind, results_list)
colnames(results) <- c("lambda_sim")
results

(2) Parallel Approach:

future::plan(future::multisession, workers = parallelly::availableCores() - 1)

iterations <- 100
lambda = 35 #par[1]

simulation <- function(i) {
  
  p <- runif(1000, 0, 1)
  x <- (-log(1-p))/(lambda)
  n_x <- length(x)
  
  loglikelihood = function(par, x, y){
    return( -(n_x*log(par[1]) - par[1]*sum(x)))
  }
  
  result <- optim(c(10), fn = loglikelihood, x = x , method = "Brent",
                  lower = 0, upper = 100)   

  return(result$par)
}

results_list <- future.apply::future_lapply(1:iterations, simulation, future.seed = TRUE) 
results <- do.call(rbind, results_list)
colnames(results) <- c("lambda_sim")

future::plan(future::sequential)

Not sure clusterApply() is the current fastest in R? Anyway I would like to ask how can we determine the initial value when using optim() in a new post...

Solution

Since you already know the distribution, i.e., exponential distribution, you can directly obtain a closed-form estimate of lambda with respect to given samples x.

f <- \(iter, n = 1e3) {
  replicate(iter, {
    p <- runif(n)
    x <- (-log(1 - p)) / (lambda)
    1 / mean(x)
  })
}

and you can see that

> system.time(f(1e6))
   user  system elapsed 
  52.22    0.38   52.85

and

> head(f(1e6))
[1] 34.24897 36.12484 34.06128 36.66326 35.39527 35.24301