rsurvey

Calculating CV and CIs with svyby and svymean when a factor variable only has one level


I am using the survey package to calculate % estimates of conditions such as smoking prevalence in a set of subgeographies, along with the confidence intervals and coefficient of variation for each estimate. This is typically straightforward for me with code such as this:

   svyby(~factor(SMOKING), ~COUNTY, na.rm=TRUE, dsgn, svymean, vartype=c("ci", "cv"))

My issue is that svymean (and svyby) returns an error if there's at least one variable where all factor values are the same:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

Here's an example situation where it throws this error:


        1    3    5    7    9   11   13   15   17   19   21   23   25   27  777  999  Sum
  1     29   46   45   63    7   54   11   24   37   39   61   49   50   55   19    4  593
  2     18   36   28   69    0   28    8   18   21   24   29   50   27   45    5    5  411
  Sum   47   82   73  132    7   82   19   42   58   63   90   99   77  100   24    9 1004

Note that the value 9 has no values of 2.

I realize that this is expected behavior for this function, but I'm wondering if there's a way to force it to return results (other than just filtering out the value 9 from the COUNTY variable from the data entirely) both so I can run functions like this routinely without having to go back and recode for specific calculations and to return confidence intervals and coefficients of variation for the values that return either 0% or 100% (such as the value 9 in the COUNTY variable).


Solution

  • Yes, there is!

    Basically, the issue is that you don't have a factor, you have a number . When you convert it to a factor, R has to guess what the possible levels are, and guesses that the observed levels are the only possible ones. If you do the conversion separately for each subgroup, you'll run into problems when the same factor levels aren't present in each subgroup.

    The solution is to convert your variable to a factor either in advance (so all the values are present) or with explicit specification of the possible levels.

    In advance

    dsgn <- update(dsgn, smoking_factor=factor(smoking))
    svyby(~smoking_factor, ~COUNTY, na.rm=TRUE, dsgn, svymean, vartype=c("ci", "cv"))
    

    With explicit specification

    svyby(~factor(SMOKING,levels=c(1,2)), ~COUNTY, na.rm=TRUE, dsgn, svymean, vartype=c("ci", "cv"))
    
    

    The same problem happens with character strings, and in that context is explicitly described on the help page for svyby in the Note section. There's also a blog post here.