[SOLVED] stripping value labels from imported SPSS `.sav` data

stripping value labels from imported SPSS `.sav` data

In haven docs, I've seen examples of how zap_labels() will strip value labels from variables. In each case in the docs, the variable used in the example was created using an R assignment operator (<-) to directly create a vector (e.g. image below, via: https://haven.tidyverse.org/reference/zap_labels.html ).

zap labels vector creation

However, I'm trying to use zap_labels() on data I've imported using haven's read_sav(), and it doesn't seem to be working as I expect.

Code: (I'm on Windows 10):

I import a .sav file using haven like so:

June18 <- read_sav("C:/ ... filename.sav", 
  user_na = FALSE) %>% 
  as_factor()

The variable I'm exploring is V1Q1_W35

Attributes:

attributes(June18$V1Q1_W35)

Output:

$levels [1] "Very fair" "Somewhat fair" "Not very fair" "Not fair at all" "Refused"

In the original .sav file, value label mappings for V1Q1_W35 look like this:

So, per my understanding, if I do zap_labels() to V1Q1_W35, I should see the original numbers in the data like 1, 2, 3, 4 and 99.

However, when I do the following, I still see value labels.

attributes(zap_labels(June18$V1Q1_W35))

Output:

$levels [1] "Very fair" "Somewhat fair" "Not very fair" "Not fair at all" "Refused"

So my question is: in this situation (trying to see the different levels), what should I be doing to see the original numbers in the data instead of the value labels they are mapped to?

Solution

This is because ehen importing your data, you convert it to a factor, which in this case just keeps the labels and gets rid of the numbers.

So you could either don‘t use the as_factor command when reading in your data before applying the zap_labels command or you can directly convert your variables to numeric during import by using as.numeric. You can of course also choose to apply this only to a subset of columns where tjis makes sense.