I am working with data imported from SPSS using the haven
package, imported using read_sav()
.
The data exists in columns of class haven_labelled
, which is somewhat similar to a factor in that it contains a value and a label but is different in other ways.
I want to recode the values in the data and associated label values.
Here is an example:
library(haven)
library(dplyr)
library(labelled)
library(tidyr)
x <- structure(list(q0015_0001 = structure(c(3, 5, NA, 3, 1, 2, NA, NA, 3, 4, 2, NA, 2, 2, 4, NA,
4, 3, 3, 3, 3, 2, NA, NA, 2), label = "Menu Options/Variety", format.spss = "F8.2", labels =
c(`Very Dissatisfied` = 1, Dissatisfied = 2, Neutral = 3, Satisfied = 4, `Very Satisfied` = 5),
class = c("haven_labelled", "vctrs_vctr", "double")), q0015_0002 = structure(c(4, 4, NA, 5, 3, 3,
NA, NA, 3, 4, 2, NA, 5, 2, 4, NA, 4, 3, 4, 4, 4, 4, NA, NA, 2), label = "Cleanliness", format.spss
= "F8.2", labels = c(`Very Dissatisfied` = 1, Dissatisfied = 2, Neutral = 3, Satisfied = 4, `Very
Satisfied` = 5), class = c("haven_labelled", "vctrs_vctr", "double")), q0015_0003 =
structure(c(2, 2, NA, 3, 1, 2, NA, NA, 3, 4, 3, NA, 4, 3, 4, NA, 3, 2, 4, 4, 2, 2, NA, NA, 1),
label = "Taste and Quality of Food", format.spss = "F8.2", labels = c(`Very Dissatisfied` = 1,
Dissatisfied = 2, Neutral = 3, Satisfied = 4, `Very Satisfied` = 5), class = c("haven_labelled",
"vctrs_vctr", "double"))), row.names = c(NA, -25L), class = c("tbl_df", "tbl", "data.frame"),
label = "File created by user")
x
# A tibble: 25 x 3
# q0015_0001 q0015_0002 q0015_0003
# <dbl+lbl> <dbl+lbl> <dbl+lbl>
# 1 3 [Neutral] 4 [Satisfied] 2 [Dissatisfied]
# 2 5 [Very Satisfied] 4 [Satisfied] 2 [Dissatisfied]
# 3 NA NA NA
# 4 3 [Neutral] 5 [Very Satisfied] 3 [Neutral]
# 5 1 [Very Dissatisfied] 3 [Neutral] 1 [Very Dissatisfied]
# 6 2 [Dissatisfied] 3 [Neutral] 2 [Dissatisfied]
# 7 NA NA NA
# 8 NA NA NA
# 9 3 [Neutral] 3 [Neutral] 3 [Neutral]
#10 4 [Satisfied] 4 [Satisfied] 4 [Satisfied]
# ... with 15 more rows
To illustrate the column structure better
x$q0015_0001
#<labelled<double>[25]>: Menu Options/Variety
# [1] 3 5 NA 3 1 2 NA NA 3 4 2 NA 2 2 4 NA 4 3 3 3 3 2 NA NA 2
#
#Labels:
# value label
# 1 Very Dissatisfied
# 2 Dissatisfied
# 3 Neutral
# 4 Satisfied
# 5 Very Satisfied
The data include values from 1 to 5, each with a corresponding label (i.e., 1 = "Very Dissatisfied", etc.). haven_labelled
allows numeric or character values.
I wish to change the values from c(1, 2, 3, 4, 5)
to c(-2, -1, 0, 1, 2)
but preserve the labels in the same order (i.e., -2 = "Very Dissatisfied", etc.).
Label | Old Value | New Value |
---|---|---|
Very Dissatisfied | 1 | -2 |
Dissatisfied | 2 | -1 |
Neutral | 3 | 0 |
Satisfied | 4 | 1 |
Very Satisfied | 5 | 2 |
The closest I have come is using dplyr::recode()
. The labelled
package is supposed to extend the dplyr::recode()
method to work with labelled vectors [1], but I haven't noticed a difference with/without it being loaded.
dplyr::recode(x$q0015_0001,`1` = -2, `2` = -1, `3` = 0, `4` = 1, `5` = 2)
#<labelled<double>[25]>: Menu Options/Variety
# [1] 0 2 NA 0 -2 -1 NA NA 0 1 -1 NA -1 -1 1 NA 1 0 0 0 0 -1 NA NA -1
#
#Labels:
# value label
# 1 Very Dissatisfied
# 2 Dissatisfied
# 3 Neutral
# 4 Satisfied
# 5 Very Satisfied
Notice that the values in the data changed as expected (3 became 0, 5 became 2, etc.) but not the label values. This means that if you were to attempt to use as_factor
(the labelled vector equivalent to as.factor
from the haven
package) to reference the labels instead of the values, the labels will be incorrect. The effect on the data is further illustrated when viewing the values and labels together.
x %>%
mutate(across(starts_with("q0015"),
~recode(., `1` = -2, `2` = -1, `3` = 0, `4` = 1, `5` = 2)))
# A tibble: 25 x 3
#q0015_0001 q0015_0002 q0015_0003
#<dbl+lbl> <dbl+lbl> <dbl+lbl>
#1 0 1 [Very Dissatisfied] -1
#2 2 [Dissatisfied] 1 [Very Dissatisfied] -1
#3 NA NA NA
#4 0 2 [Dissatisfied] 0
#5 -2 0 -2
#6 -1 0 -1
#7 NA NA NA
#8 NA NA NA
#9 0 0 0
#10 1 [Very Dissatisfied] 1 [Very Dissatisfied] 1 [Very Dissatisfied]
# ... with 15 more rows
As shown, the labels still map to the old values. In the recoded version, 1 and 2 are positive scores but still map to Very Dissatisfied/Dissatisfied, while -2, -1 and 0 are not recognized as labelled values.
Question How may I recode labelled vectors such that the data values and label values are updated together and labels are preserved/mapped to the new values?
It's ugly AF, but it does the job. Problem is that setting value labels is not straightforward. Package labelled
offers functions for it, but these aren't "tidyverse-ready", i.e. they don't work within a mutate, nor do they allow for selecting variables with tidyselect helpers like starts_with
.
However, set_value_labels
allos for passing a list where each list element carries the name of the variable you want to apply labels to and then the labels itself are provided as a named vector:
x |>
mutate(across(starts_with("q0015"),
~dplyr::recode(., `1` = -2, `2` = -1, `3` = 0, `4` = 1, `5` = 2))) |>
set_value_labels(.labels = rep(list(c("Very Dissatisfied" = -2,
"Dissatisfied" = -1,
"Neutral" = 0,
"Satisfied" = 1,
"Very Satisfied" = 2)),
x |>
select(starts_with("q0015")) |>
ncol()) |>
setNames(nm = x |>
select(starts_with("q0015")) |>
names()))
which gives:
# A tibble: 25 × 3
q0015_0001 q0015_0002 q0015_0003
<dbl+lbl> <dbl+lbl> <dbl+lbl>
1 0 [Neutral] 1 [Satisfied] -1 [Dissatisfied]
2 2 [Very Satisfied] 1 [Satisfied] -1 [Dissatisfied]
3 NA NA NA
4 0 [Neutral] 2 [Very Satisfied] 0 [Neutral]
5 -2 [Very Dissatisfied] 0 [Neutral] -2 [Very Dissatisfied]
6 -1 [Dissatisfied] 0 [Neutral] -1 [Dissatisfied]
7 NA NA NA
8 NA NA NA
9 0 [Neutral] 0 [Neutral] 0 [Neutral]
10 1 [Satisfied] 1 [Satisfied] 1 [Satisfied]
# … with 15 more rows
# ℹ Use `print(n = ...)` to see more rows
I was curious and checked with the package developer of the labelled
package, and an alternative would be to write a small function for recoding and relabeling a single variable and then run this function within across
: