rr-havenr-labelled

Turning SPSS chr+label variables into labelled factors in R


I cannot reproduce this example

I have an SPSS dataset with chr + label variables which I have converted to an R object using the read_sav() function from the haven package. When I call their structure via str(foo) they look like this

#' chr+lbl [1:5038]  0,  1,  1,  1,  1,  0,  1,  1,  1,  1,  1,  1,   ,  1,  1,  1,  0,  0,  1,  1,  0,  0,  1,   ,   ,   ,  1,  1,   ,  1...
#' @ label        : chr "q03_05_nonmedyn"
#' @ format.spss  : chr "A2"
#' @ display_width: int 11
#' @ labels       : Named chr [1:2] "0" "1"
#' ..- attr(*, "names")= chr [1:2] "No" "Yes"

I would like to convert them to labelled factors but when I run them through the to_factor() function from the labelled package they come out like this

# Factor w/ 5 levels ""," 0"," 1","No",..: 2 3 3 3 3 2 3 3 3 3 ...
# - attr(*, "label")= chr "q03_05_nonmedyn"

The labels have been added as levels of the factor rather as labels imposed on top of the numbers.

The labelled::to_factor() function does what I need when the variable is a dbl+lbl rather than a chr+lbl. I assume this is what is going wrong but I am not sure what I need to do to convert the latter to the former.

Any advice much appreciated.


Solution

  • It looks like you need to do some data cleaning before converting to a factor. For example, the data seem to be using a blank for missing. (You see in the first str that there are values that seem to be empty spaces, which I assume to be missing.)

    Haven reads this in as a character, and when it did that it found that there were 5 different values in the columns, "", "Yes", "No", "0", "1".

    If we assume that "" is missing, you need to find out from whoever created the original data set whether "0" and "1" mean something special or if they were supposed to be "No" and "Yes". Once you know that, you can clean up the data either by

    1. recoding the "0" and "1" into "No" and "Yes" or by recoding them into appropriate labels.
    2. recoding the "" to NA.

    Then you can convert the cleaned up data into a factor.