Out of this SO post resulted a discussion when benchmarking the various solutions. Consider the following code
# global environment is empty - new session just started
# set up
set.seed(20181231)
n <- sample(10^3:10^4,10^3)
for_loop <- function(n) {
out <- integer(length(n))
for(k in 1:length(out)) {
if((k %% 2) == 0){
out[k] <- 0L
next
}
out[k] <- 1L
next
}
out
}
# benchmarking
res <- microbenchmark::microbenchmark(
for_loop = {
out <- integer(length(n))
for(k in 1:length(out)) {
if((k %% 2) == 0){
out[k] <- 0L
next
}
out[k] <- 1L
next
}
out
},
for_loop(n),
times = 10^4
)
Here are the benchmarking results for the exact same loops, one packed in a function, the other not
# Unit: microseconds
# expr min lq mean median uq max neval cld
# for_loop 3216.773 3615.360 4120.3772 3759.771 4261.377 34388.95 10000 b
# for_loop(n) 162.280 180.149 225.8061 190.724 211.875 26991.58 10000 a
ggplot2::autoplot(res)
As can be seen, there is a drastic difference in efficiency. What is the underlying reason for this?
To be clear, the question is not about the task solved by the above code (which could be done much more elegantly) but merely about efficiency discrepancy between a regular loop and a loop wrapped inside a function.
The explanation is that functions are "just-in-time" compiled, whereas interpreted code is not. See ?compiler::enableJIT
for a description.
If you want a demonstration of the difference, run
compiler::enableJIT(0)
before any of your code (including the creation of the for_loop
function). This disables JIT compiling for the rest of that session. Then the timing will be much more similar for the two sets of code.
You have to do this before the creation of the for_loop
function, because once it gets compiled by the JIT compiler, it will stay compiled, whether JIT is enabled or not.