Generate a weighted stratified sample, with variable input and variable weightings. Expected input is a variable length factor of integers with a varying number of levels.
I'm attempting to avoid hard-coding the weightings and strata, as they may vary. There are many questions on stack exchange regarding stratified sampling, but none that I could see avoiding hard-coded values.
I'm still a bit new to R and have tried various methods: survey::svydesign() and sampling::balancedstratification(). None seem to take a vector of frequency proportions to use as weightings.
variable_vector <- as.factor(c(1, 1, 1, 2, 2, 2, 2, 3))
freq_prop <- prop.table(table(factor_vector))
library(survey)
mysdesign <- svydesign(id = ~1,
strata = ~levels(variable_vector),
data = variable_vector,
fpc = freq_prop)
library(sampling)
sampling::balancedstratification(variable_vector,
strata = levels(variable_vector),
pik = freq_prop)
Neither of the above methods have worked.
Output from freq_prop is
[1] 0.375 0.500 0.125
Now I need a way of generating random samples of size 30 for example:
sample size 1 = 30 * 0.375
sample size 2 = 30 * 0.500
sample size 3 = 30 * 0.125
You can use base-r
sample
to generate a random sample. For example, to generate a random sample size of 30
using elements {1,2,3}
of a set with a 0.375, 0.5, 0.125
probability for 1,2
and 3
respectively, we can do the following
set.seed(777)
r_sample<- sample(c(1,2,3), size=30, replace = TRUE, prob = c(0.375, 0.5, 0.125))
table(r_sample)
# r_sample
# 1 2 3
# 13 14 3
You can also see ?sample
to see the help page.