Why does my benchmarking show some oscillation with inceasing sample size?

I made a benchmark with microbenchmark::microbenchmark() for two functions and varied the sample size entered in the functions. While doing so I noticed that every time there is some oscillation in the performance time. Here is a plot of the benchmark of a simple while function:

fun <- function(n){
  i <- 1
  while (n > i) {
    i <- i + 1
  }
}

x axis is the sample size (n), in log scale, and y axis shows the median performance times. I would expect that the time to run the function should increase when n increases but this is not the case. I run the code multiple times and always get the same picture: There is some oscillation. But why?

The full code used to generate the data and plot is:

# Load libraries.
library(magrittr)
library(ggplot2)
# Define some function.
fun <- function(n){
  i <- 1
  while (n > i) {
    i <- i + 1
  }
}
# Set sample sizes.
n <- c(10, 100, 1000, 10000, 100000, 1000000)
# Run per predefined sample sizes..
results<- lapply(n, function(n_i){
  results_i <- microbenchmark::microbenchmark(fun(n= n_i),
                                          times= 1000)

  data.frame(n= n_i,
             time= summary(results_i)$median)
}) %>%
  do.call("rbind.data.frame", .)

# Produce a plot.
results %>%
  ggplot(data= ., mapping= aes(x= n, y= time)) +
  geom_smooth(se= FALSE) +
  geom_point() +
  scale_x_continuous(trans='log10') +
  theme_minimal()

Solution

This is not what it appears. There are cases where the median microbenchmark() times can be misleading. Benchmarked expressions are by default randomly shuffled, rather than being executed sequentially. This is to avoid results appearing faster than they would be if executed only once, owing to optimizations like processor caching. But it means expensive garbage collection thresholds can be reached by one call, with the cost is paid by the next call, which may be a different one.

However, this is not one of those examples. Your problem is that when you do time = summary(results_i)$median), you are extracting the numeric value of the median time but not accounting for changes in time units. Compare:

microbenchmark::microbenchmark(
    fun(n= 1e4),
    times= 10
)
# Unit: microseconds
#            expr   min     lq    mean  median     uq    max neval
#  fun(n = 10000) 959.1 1111.6 1141.56 1150.25 1218.2 1235.1    10

However, when you step this up to 1e5 you are now looking at milliseconds:

microbenchmark::microbenchmark(
    fun(n= 1e5),
    times= 10
)
# Unit: milliseconds
#            expr    min     lq    mean  median      uq     max neval
#  fun(n = 1e+05) 5.9059 6.7059 9.54744 10.2045 11.1353 13.9593    10

To ensure the same time units, either use the unit parameter in your call to microbenchmark::microbenchmark() or, as below, use the time column which is the measured execution time of the expression in nanoseconds.

results <- lapply(
    n, 
    function(n_i) {
       microbenchmark::microbenchmark(
        fun(n= n_i),
        times= 100
      ) |> 
      transform(n = n_i)
}) |>
  do.call(rbind, args = _)

results |>
    dplyr::summarise(
        median = median(time),
        .by = n
    ) |>
  ggplot(mapping= aes(x= n, y= median)) +
  geom_smooth(se = FALSE)
  geom_point() +
  scale_x_continuous(trans='log10') +
  theme_minimal()

This produces results as expected: