I am numerically integrating with integrate()
. The function I am integrating depends on several parameters and I would like to calculate the integral for a large number of parameter values. I currently do this using the mapply
function.
fc <- function(x, a, b) a*x + b
fc.wrapper <- function(p.a, p.b) integrate(fc, 0, 1, a = p.a, b = p.b)$value
a.vec <- seq(0, 1, length.out = 100)
b.vec <- seq(0, 1, length.out = 100)
system.time(mapply(fc.wrapper, a.vec, b.vec))
user system elapsed
0.003 0.000 0.004
Problem is, in my application, this takes too long to run. So my question is, are there faster alternatives to the use of the mapply()
? I have tried using future_mapply()
, but it doesn't seem to speed things up.
future.apply()
:library(future.apply)
plan(multisession, workers = 6L)
system.time(future_mapply(fc.wrapper, a.vec, b.vec, future.seed = T))
user system elapsed
0.315 0.008 0.752
I am also open to suggestions for faster user-written alternatives to integrate()
.
For a 1-D smooth function, with enough memory you can use straight non-adaptive Gauss-Legendre quadrature with a relatively large number of nodes to vectorize the integration. I'll demonstrate with a function with moderate computational cost:
f1 <- function(x, A, B, C, D) {
(A*lgamma(x) - B*log(x) + C*qgamma(pcauchy(x), 5))*exp(sin(D*x))
}
a <- 0; b <- 100 # integration extents
A <- B <- C <- D <- seq(0, 1, length.out = 20) # parameter values
I'll explore the full 20^4 parameter space by using expand.grid
:
system.time({
ba2 <- (b - a)/2 # to normalize the integration extents to [-1, 1]
nodes <- statmod::gauss.quad(100) # calculate the nodes
x <- nodes[[1]]*ba2 + (b + a)/2
w <- nodes[[2]]*ba2
df1 <- expand.grid(x = x, A = A, B = B, C = C, D = D)
df1 <- cbind(
expand.grid(A = A, B = B, C = C, D = D),
v = c(crossprod(w, matrix(do.call(f1, df1), length(x))))
)
})
#> user system elapsed
#> 23.01 0.12 23.15
Spot check the integration:
sum(f1(x, 0.5, 0.5, 0.5, 0.5)*w)
#> [1] 9919.591
integrate(\(x) f1(x, 0.5, 0.5, 0.5, 0.5), a, b, rel.tol = 1e-6)
#> 9919.598 with absolute error < 0.00023
If the function is amenable to broadcasting, the integration can be sped up even more using outer
:
f2 <- function(x, A, B, C, D) {
(outer(lgamma(x), A) - outer(log(x), B) + outer(qgamma(pcauchy(x), 5), C))*
exp(sin(outer(x, D)))
}
system.time({
ba2 <- (b - a)/2
nodes <- statmod::gauss.quad(100)
x <- nodes[[1]]*ba2 + (b + a)/2
w <- nodes[[2]]*ba2
df2 <- expand.grid(A = A, B = B, C = C, D = D)
df2 <- transform(df2, v = c(crossprod(w, f2(x, A, B, C, D))))
})
#> user system elapsed
#> 1.45 0.05 1.52
all.equal(df1$v, df2$v)
#> [1] TRUE
This is all using a single thread. It would be straightforward to do the above in parallel.
Compare to the mapply
approach in the question:
system.time({
f3 <- function(A, B, C, D) {
integrate(f1, a, b, A = A, B = B, C = C, D = D)$value
}
df3 <- expand.grid(A = A, B = B, C = C, D = D)
df3 <- transform(df3, v = mapply(f3, A, B, C, D))
})
#> user system elapsed
#> 150.28 0.04 150.58
all.equal(df1$v, df3$v)
#> [1] "Mean relative difference: 2.018989e-05"