rexpss

Handling expss cross-classification tables and missing values


I am investigating how to use the expss package to emulate traditional market research tabulations / cross-classification tables. I have come across some strange behavior regarding how "invalid" responses are handled, and wondered if there is a solution.

As background, many customer opinion surveys allow respondents to skip over a question, either because they don't know the answer or they aren't comfortable giving a reply. These respondents are recorded with values like -999 or -998, and are typically removed from percentage or mean calculations. Often they are shown in tables, for full accounting, but are not in the "base" or denominator in the percentages.

Another very common practice is to combine likert scale ratings. For example 4 and 5 ratings on a 5-point satisfaction scale are combined to make a "top 2 box" or "satisfied" grouping.

I would like to be able to make a tabulation that shows the missing responses, but then removes them from calculations. But I haven't been able to work that out with expss. The closest I've gotten is to mark values as missing within the expss statement. However this works for only the "straight" rating section of the table, not for the section that shows the 4+5 rating group.

What to do in expss? Which other package has capabilities for stat testing (both unweighted and weighted)?

Here is the code and output for a reproduceable error.

= = = = = = = = = = = = = = = =

df <- tibble(
  id = 1:15,
  quest1 = c(1, 1, 2, 2, 2, 3, 4, 4, 4, 5, 5, 5, 5, -999, -998),
  grp = c("G1", "G2", "G1", "G2", "G1", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2", "G1", "G2"))

example1 = df %>%
  tab_cols(grp) %>%
  tab_cells(quest1,
            quest1 %in% c(4, 5)) %>%
  tab_stat_cpct() %>%
  tab_pivot()

example1

 |                     |              |  grp |      |
 |                     |              |   G1 |   G2 |
 | ------------------- | ------------ | ---- | ---- |
 |              quest1 |         -999 | 12.5 |      |
 |                     |         -998 |      | 14.3 |
 |                     |            1 | 12.5 | 14.3 |
 |                     |            2 | 25.0 | 14.3 |
 |                     |            3 | 12.5 |      |
 |                     |            4 | 12.5 | 28.6 |
 |                     |            5 | 25.0 | 28.6 |
 |                     | #Total cases |  8.0 |  7.0 |
 | quest1 %in% c(4, 5) |        FALSE | 62.5 | 42.9 |
 |                     |         TRUE | 37.5 | 57.1 |
 |                     | #Total cases |  8.0 |  7.0 |
example2 = df %>%
  tab_cols(grp) %>%
  tab_cells(quest1,
            quest1 %in% c(4, 5)) %>%
  tab_mis_val(c(-999, -998)) %>%      ## statement for exclusion
  tab_stat_cpct() %>%
  tab_pivot()

example2

 |                     |              |  grp |      |
 |                     |              |   G1 |   G2 |
 | ------------------- | ------------ | ---- | ---- |
 |              quest1 |            1 | 14.3 | 16.7 |
 |                     |            2 | 28.6 | 16.7 |
 |                     |            3 | 14.3 |      |
 |                     |            4 | 14.3 | 33.3 |
 |                     |            5 | 28.6 | 33.3 |
 |                     | #Total cases |  7.0 |  6.0 |  <- excluded
 | quest1 %in% c(4, 5) |        FALSE | 62.5 | 42.9 |
 |                     |         TRUE | 37.5 | 57.1 |
 |                     | #Total cases |  8.0 |  7.0 |  <- not excluded

Solution

  • tab_mis_val doesn't work because quest1 %in% c(4, 5) has only TRUE/FALSE values, so there are no values to remove. It is better to use special function net for top/bottom calculation. You can find manual for this function by typing ?net in the RStudio console.

    df %>%
        tab_cols(grp) %>%
        tab_cells(quest1,
                  net(mis_val(quest1, c(-999, -998)), BOTTOM = 1:3, TOP2 = 4:5)) %>% # mis_val remove missings and TOP and BOTTOM group values.
        tab_stat_cpct() %>%
        tab_pivot()
    
    
    # |              |              |  grp |      |
    # |              |              |   G1 |   G2 |
    # | ------------ | ------------ | ---- | ---- |
    # |       quest1 |         -999 | 12.5 |      |
    # |              |         -998 |      | 14.3 |
    # |              |            1 | 12.5 | 14.3 |
    # |              |            2 | 25.0 | 14.3 |
    # |              |            3 | 12.5 |      |
    # |              |            4 | 12.5 | 28.6 |
    # |              |            5 | 25.0 | 28.6 |
    # |              | #Total cases |  8.0 |  7.0 |
    # |       BOTTOM |              | 57.1 | 33.3 |
    # |         TOP2 |              | 42.9 | 66.7 |
    # | #Total cases |              |  7.0 |  6.0 |