I have a list of historical frequencies of elements that have occurred together over time. These elements may have occurred (without repetition) in sequences of various order and length.
For example, this could be a list of historic sequences: abc gabd ace
My challenge is to collect a simulated of size n from a list of weighted probabilities. So a has appeared in 90% of the historic sequences, b 70% and so on.
What is a simple way I can generate a weighted sample of 3 elements. Eventually I will put this in a loop to simulate that sample 100s of times and collect results but for now generating a single sample will help get me in the right direction.
library(tibble)
historical_p <-
tribble(
~element, ~p,
'a', .9,
'b', .7,
'c', .5,
'd', .1,
'e', .1,
'f', .1,
'g', .1
)
Use sample
with the prob
argument to generate one sample of n
values chosen without replacement from the set elements
with weights p
:
set.seed(369894129)
element <- letters[1:7]
p <- c(0.9, 0.7, 0.5, 0.1, 0.1, 0.1, 0.1) # weights
n <- 3 # number of elements per sample
sample(element, n, FALSE, p)
#> [1] "a" "f" "b"
A way to generate N
samples (inspired by this answer):
N <- 1e5 # number of samples
system.time({
s <- Rfast::colOrder(
matrix(runif(N*length(p)), length(p))^(1/p), FALSE, TRUE
)[1:n,]
s[] <- element[s]
})
#> user system elapsed
#> 0.15 0.04 0.19
View the first 10 samples.
s[,1:10]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] "a" "a" "a" "c" "d" "b" "b" "c" "c" "c"
#> [2,] "c" "e" "b" "a" "c" "a" "a" "a" "f" "a"
#> [3,] "b" "f" "d" "d" "a" "c" "c" "g" "b" "b"