rdata.tabledoparallel

Fast Sampling with Replacement in R without using a loop or Apply


I need a fast and efficient way of sampling with replacement for my Bootstrapping exercise.

I found a similar question here but the solution doesn't offer enough of a speed up similar question

Here is an example of what I am doing

library(data.table)

dt <- data.table(x = runif(50000))
samples <- 1000

# create a matrix of results that I will populate in my loop
# here I have done 1,000 samples but ideally I want to do 10,000
p <-  matrix(NA, 3000, samples)

system.time({
  lapply(seq_len(samples), function(s) {
    
    include <- dqsample(dt, 3000, replace = TRUE)
    #sample_data <- dt[include,]
    
    #return <- sample_data[, get(profit_col)]
    p[, s] <- include
    
  })
})

This takes around 10 seconds to run, the actual data takes about 200 seconds because I am doing some calculations once I've sampled the data.

I thn need to run this about 90x trying different combinations of vaiables so pratically speaking it is too long to wait.

I was wondering if there was a way to do it in patrallel (I am running Windows 11) or comile this into a function?


Solution

  • You can simply try sample along with replicate, e.g.,

    dt <- data.table(x = runif(50000))
    N <- 1000
    L <- 3000
    p <- dt[, replicate(N, sample(x, L, replace = TRUE))]