Using the default arguments, what is the time complexity of sample
? I.e. how does the running time of sample(1:N)
grow with N?
Documentation for sample is here but does not specify time complexity.
Looks linear once N > 30,000 in my testing, where average sampling time for 10 samples out of 1:N is around 0.5 microseconds per value in the range. For lower N there seems to be overhead which makes the time per sample higher.
sample_range = 10^seq(2, 7, 0.05)
take_sample <- function(n) {
test <- microbenchmark::microbenchmark(
sample(n, 10), times = 100, unit = "microseconds")
avg = mean(test$time)
data.frame(n, avg)
}
# takes about 20 sec on my machine
sample_results <- lapply(sample_range, take_sample)
Then to show those results:
library(tidyverse)
bind_rows(sample_results) |>
mutate(microsec_per_range = avg / n) |>
ggplot(aes(n, microsec_per_range)) +
geom_point() +
expand_limits(y = 0) +
scale_x_log10(breaks = c(10^(2:7), 3*10^(2:7)),
labels = scales::comma_format()) +
scale_y_continuous(breaks = c(0, 0.3, 1, 3, 10, 30, 100),
trans = scales::pseudo_log_trans(sigma = 0.1))