I need a fast and efficient way of sampling with replacement for my Bootstrapping exercise.
I found a similar question here but the solution doesn't offer enough of a speed up similar question
Here is an example of what I am doing
library(data.table)
dt <- data.table(x = runif(50000))
samples <- 1000
# create a matrix of results that I will populate in my loop
# here I have done 1,000 samples but ideally I want to do 10,000
p <- matrix(NA, 3000, samples)
system.time({
lapply(seq_len(samples), function(s) {
include <- dqsample(dt, 3000, replace = TRUE)
#sample_data <- dt[include,]
#return <- sample_data[, get(profit_col)]
p[, s] <- include
})
})
This takes around 10 seconds to run, the actual data takes about 200 seconds because I am doing some calculations once I've sampled the data.
I thn need to run this about 90x trying different combinations of vaiables so pratically speaking it is too long to wait.
I was wondering if there was a way to do it in patrallel (I am running Windows 11) or comile this into a function?
You can simply try sample
along with replicate
, e.g.,
dt <- data.table(x = runif(50000))
N <- 1000
L <- 3000
p <- dt[, replicate(N, sample(x, L, replace = TRUE))]