I need to weight the observations in a sample based on the marginal distributions of four demographic characteristics from a broader population. I'm currently using the package anesrake
to do so.
The population info is stored in targets
. This is a list containing 4 elements - one numeric vector for each respondent attribute I want to weight my sample based on. The row names of each element represent the different categories. I create targets
here:
quota_age <- c(0.30, 0.33, 0.37)
quota_race <- c(0.62, 0.12, 0.17, 0.5, 0.3)
quota_gender <- c(0.52, 0.48)
quota_ed <- c(0.41, 0.29, 0.19, 0.11)
names(quota_age) <- c("18 to 34", "35 to 54", "55+")
names(quota_race) <- c("White non-Hispanic", "Black non-Hispanic", "Hispanic", "Asian", "Other")
names(quota_gender) <- c("Female", "Male")
names(quota_ed) <- c("HS or less", "Some college", "Bachelors", "Advanced")
targets <- list(quota_age, quota_race, quota_gender, quota_ed)
The survey file (m1b
) is a data frame containing demographic info and a unique ID for each respondent (link to google sheet here). Here are the first few obs:
> head(m1b)
ResponseId quota_ed quota_age quota_gender quota_race
1 R_3McITJbfcFuwc9x Some college 18 to 34 Female White non-Hispanic
2 R_2q3oeAbZgCZ5YcZ Bachelors 55+ Female White non-Hispanic
3 R_YSVccSQ1xJ6zuDv Advanced 35 to 54 Female White non-Hispanic
4 R_DubbKu7uJicbpQd Some college 35 to 54 Male White non-Hispanic
5 R_5zj5CNu598lCwRX Bachelors 55+ Male Other
6 R_21mPGFS7kHX2ELm Advanced 55+ Female White non-Hispanic
Using the anesrake
package, I want to construct a new variable called weight
that I can use to account for differences between the population and sample marginal distributions in later analyses.
But when I call the anesrake
function like so (the pctlim
argument is extremely small to exaggerate my point):
library(anesrake)
raking <- anesrake(inputter = targets,
dataframe = m1b,
caseid = m1b$ResponseId,
choosemethod = "total",
type = "pctlim",
pctlim = 0.0000001)
I get the following error:
Error in selecthighestpcts(discrep1, inputter, pctlim) :
No variables are off by more than 0.00001 percent using the method you have chosen, either weighting is
unnecessary or a smaller pre-raking limit should be chosen.
Even though this is objectively not true. Consider the quota_ed target for example:
> targets[[4]]
HS or less Some college Bachelors Advanced
0.41 0.29 0.19 0.11
> wpct(m1b$quota_ed)
Advanced Bachelors HS or less Some college
0.1614583 0.3645833 0.1666667 0.3072917
Any thoughts on what I'm doing wrong would be greatly appreciated. See this link to an RBloggers post for the routine I'm trying to emulate.
For the anesrake function to work, the following steps might be necessary:
names(targets) <- c("quota_age", "quota_race", "quota_gender", "quota_ed")
.