rshufflesample

What is the time complexity of sample?


Using the default arguments, what is the time complexity of sample? I.e. how does the running time of sample(1:N) grow with N?

Documentation for sample is here but does not specify time complexity.


Solution

  • Looks linear once N > 30,000 in my testing, where average sampling time for 10 samples out of 1:N is around 0.5 microseconds per value in the range. For lower N there seems to be overhead which makes the time per sample higher.

    sample_range = 10^seq(2, 7, 0.05)
    
    take_sample <- function(n) {
        test <- microbenchmark::microbenchmark(
                  sample(n, 10), times = 100, unit = "microseconds")
        avg = mean(test$time)
        data.frame(n, avg)
      }
    
    # takes about 20 sec on my machine
    sample_results <- lapply(sample_range, take_sample)
    

    Then to show those results:

    library(tidyverse)
    bind_rows(sample_results) |>
      mutate(microsec_per_range = avg / n) |>
      ggplot(aes(n, microsec_per_range)) +
      geom_point() +
      expand_limits(y = 0) +
      scale_x_log10(breaks = c(10^(2:7), 3*10^(2:7)),
                    labels = scales::comma_format()) +
      scale_y_continuous(breaks = c(0, 0.3, 1, 3, 10, 30, 100),
                         trans = scales::pseudo_log_trans(sigma = 0.1))
    

    enter image description here