I am doing a simulation study. I am taking Exponential Distribution as an example.
A short introduction here... given that F(x) = 1 - e^(-λx)
, so x = (-ln[1-F(x)])/(λ)
. In R, I will write it as x = (-log(1-p))/(lambda)
where F(x)
is p
. Same goes to its log-likelihood, L_X = n_x*log(lambda) - lambda*sum(x)
Because I am planning to run the iterations up to 1 million times, I am seeking for a faster version.
(The loop approach takes almost 10 hours, while the parallel version takes around 3.4 hours. I definitely need to improve the speed and efficiency.)
My objective is to collect all the value of "simulated lambda"
(1) Base Loop:
iterations <- 100
lambda = 35 #par[1]
simulation <- function(i) {
p <- runif(1000, 0, 1)
x <- (-log(1-p))/(lambda)
n_x <- length(x)
loglikelihood = function(par, x, y){
return( -(n_x*log(par[1]) - par[1]*sum(x)))
}
result <- optim(c(10), fn = loglikelihood, x = x , method = "Brent",
lower = 0, upper = 100)
return(result$par)
}
results_list <- lapply(1:iterations, simulation)
results <- do.call(rbind, results_list)
colnames(results) <- c("lambda_sim")
results
(2) Parallel Approach:
future::plan(future::multisession, workers = parallelly::availableCores() - 1)
iterations <- 100
lambda = 35 #par[1]
simulation <- function(i) {
p <- runif(1000, 0, 1)
x <- (-log(1-p))/(lambda)
n_x <- length(x)
loglikelihood = function(par, x, y){
return( -(n_x*log(par[1]) - par[1]*sum(x)))
}
result <- optim(c(10), fn = loglikelihood, x = x , method = "Brent",
lower = 0, upper = 100)
return(result$par)
}
results_list <- future.apply::future_lapply(1:iterations, simulation, future.seed = TRUE)
results <- do.call(rbind, results_list)
colnames(results) <- c("lambda_sim")
future::plan(future::sequential)
Not sure clusterApply()
is the current fastest in R? Anyway I would like to ask how can we determine the initial value when using optim()
in a new post...
Since you already know the distribution, i.e., exponential distribution, you can directly obtain a closed-form estimate of lambda
with respect to given samples x
.
f <- \(iter, n = 1e3) {
replicate(iter, {
p <- runif(n)
x <- (-log(1 - p)) / (lambda)
1 / mean(x)
})
}
and you can see that
> system.time(f(1e6))
user system elapsed
52.22 0.38 52.85
and
> head(f(1e6))
[1] 34.24897 36.12484 34.06128 36.66326 35.39527 35.24301