rsurveyweighted

Survey package: what does the combined.weights argument really mean?


The package documentation for the combined.weights argument in the svrepdesign function states:

TRUE if the repweights already include the sampling weights. This is usually the case.

What does "include" mean here? I have not found an example online that specifies combined.weights = TRUE and where, according to the regular expression for repweights, the sampling weight is literally included as one of the repweights. As just one example, the code in this question specifies repweights = "k4natwt.rep[1-9]+", weights = ~k4natwt and combined.weights = TRUE. Clearly, the sampling weights ("k4natwt") are not included in the replicate weights (which all take the form "k4natwt.repX" where X is some integer). Therefore, I assume "include" means something else, but I can't find out what.

A clue could be the function's own warning. I am analyzing the same survey as the question I linked to above (although not the same wave), and I get the following warning if I try to create the survey design object with combined.weights = FALSE:

Warning: Data look like combined weights: mean replication weight is 360.117082418768  and mean sampling weight is 360.02467073254

I can guess that the mean of the weights and replicate weights is relevant, but it's not immediately clear to me why.

Looking at Lumley's "Complex Surveys: A guide to analysis using R", I find this passage:

The option combined.weights specifies that the replicate weights include the sampling weight; the alternative is that they need to be multiplied by the sampling weight.

Does "include" then mean "include in their calculation"? Meaning that it is possible to create replicate weights based on the design alone, ignoring the sampling weight, and then multiplying the resulting repweights by the sampling weight makes them "include" the sampling weight? I feel like I'm approaching the truth but am still unsure.

So, what does combined.weights mean exactly, and how can I know, for a given survey, whether TRUE or FALSE is appropriate?


Solution

  • Since combined.weights=TRUE is the default, it's never necessary to supply it, which is presumably why you don't find examples that use it explicitly.

    combined.weights=TRUE means that the repweights argument are the replicate weights; combined.weights=FALSE means that the replicate weights are the repweights argument multiplied by the sampling weights.