I am using the survey package to calculate % estimates of conditions such as smoking prevalence in a set of subgeographies, along with the confidence intervals and coefficient of variation for each estimate. This is typically straightforward for me with code such as this:
svyby(~factor(SMOKING), ~COUNTY, na.rm=TRUE, dsgn, svymean, vartype=c("ci", "cv"))
My issue is that svymean (and svyby) returns an error if there's at least one variable where all factor values are the same:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
Here's an example situation where it throws this error:
1 3 5 7 9 11 13 15 17 19 21 23 25 27 777 999 Sum
1 29 46 45 63 7 54 11 24 37 39 61 49 50 55 19 4 593
2 18 36 28 69 0 28 8 18 21 24 29 50 27 45 5 5 411
Sum 47 82 73 132 7 82 19 42 58 63 90 99 77 100 24 9 1004
Note that the value 9 has no values of 2.
I realize that this is expected behavior for this function, but I'm wondering if there's a way to force it to return results (other than just filtering out the value 9 from the COUNTY variable from the data entirely) both so I can run functions like this routinely without having to go back and recode for specific calculations and to return confidence intervals and coefficients of variation for the values that return either 0% or 100% (such as the value 9 in the COUNTY variable).
Yes, there is!
Basically, the issue is that you don't have a factor, you have a number . When you convert it to a factor, R has to guess what the possible levels are, and guesses that the observed levels are the only possible ones. If you do the conversion separately for each subgroup, you'll run into problems when the same factor levels aren't present in each subgroup.
The solution is to convert your variable to a factor either in advance (so all the values are present) or with explicit specification of the possible levels.
In advance
dsgn <- update(dsgn, smoking_factor=factor(smoking))
svyby(~smoking_factor, ~COUNTY, na.rm=TRUE, dsgn, svymean, vartype=c("ci", "cv"))
With explicit specification
svyby(~factor(SMOKING,levels=c(1,2)), ~COUNTY, na.rm=TRUE, dsgn, svymean, vartype=c("ci", "cv"))
The same problem happens with character strings, and in that context is explicitly described on the help page for svyby
in the Note
section. There's also a blog post here.