I have labeled SPSS data like this:
library(labelled)
library(tidyverse)
test <- tibble(
var_1 = labelled_spss(
c(1:4, 89,999),
c(Terrible = 1, Meh = 2, Better = 3, Awesome = 4, dk = 89, "does not apply" = 999)
),
var_2 = labelled_spss(
c(1:4, 890,998),
c(Terrible = 1, Meh = 2, Better = 3, Awesome = 4, dk = 890, "does not apply" = 998))
)
test
# A tibble: 6 × 2
var_1 var_2
<dbl+lbl> <dbl+lbl>
1 1 [Terrible] 1 [Terrible]
2 2 [Meh] 2 [Meh]
3 3 [Better] 3 [Better]
4 4 [Awesome] 4 [Awesome]
5 89 [dk] 890 [dk]
6 999 [does not apply] 998 [does not apply]
Note the different numeric values for dk
and does not apply
.
I would like to set dk
and does not apply
as NAs programmatically (i.e. without individually specifying the name of the variable) and also based on the label rather than the value.
My idea is something like this pseudocode:
my_na_labels <- c("dk", "does not apply")
test %>%
mutate(across(c(var_1, var_2), ~ set_na_values(. %in% my_na_labels)))
Which unfortunately does not work.
The solution given in the labelled vignette uses the variable names and also tags NAs based on their numeric values. Since programmatic tagging NAs based on their numeric values does not work here (since each label has multiple numeric values attached), I am looking for a solution that does not require hard-coded numeric values but works with the existing labels instead.
The outcome, which I can easily produce if I use hard-coded NA values, should look something like this and should be generalizable for many variables:
test %>%
set_na_values(var_1 = c(89, 999),
var_2 = c(890, 998))
# A tibble: 6 × 2
var_1 var_2
<dbl+lbl> <dbl+lbl>
1 1 [Terrible] 1 [Terrible]
2 2 [Meh] 2 [Meh]
3 3 [Better] 3 [Better]
4 4 [Awesome] 4 [Awesome]
5 89 (NA) [dk] 890 (NA) [dk]
6 999 (NA) [does not apply] 998 (NA) [does not apply]
How about:
library(labelled)
library(dplyr)
my_na_labels <- c("dk", "does not apply")
fun <- function(x, varlabels) {
na_values(x) <- val_labels(x)[varlabels]
return(x)
}
test |>
mutate(across(c("var_1", "var_2"), ~ fun(., varlabels = my_na_labels)))
Output:
# A tibble: 6 × 2
var_1 var_2
<dbl+lbl> <dbl+lbl>
1 1 [Terrible] 1 [Terrible]
2 2 [Meh] 2 [Meh]
3 3 [Better] 3 [Better]
4 4 [Awesome] 4 [Awesome]
5 89 (NA) [dk] 890 (NA) [dk]
6 999 (NA) [does not apply] 998 (NA) [does not apply]