I'm using the labelled
package to add value labels to a dataset that I'm working on. Many of these variables will use the same set of labels of the column name ends in some derivative of _p1,_p2, etc. So I want to be able to use the set_value_labels function from the Labelled package to set all of these labels at once rather than typing in the name of each variable.
A reproducible dataframe for this purpose could be
df <- data.frame(V1 = c(1,2,3), V2_p1 = c(1,0,1), V2_p2 = c(0,0,1), V3_pX = c(1,0,1), V4_pY = c(1,0,0), V5 = c(3,2,1))
My desired outcome would be attaching value labels of 1 = "Yes" and 0 = "No" to every variable except V1 and V5 since they do not end in a _p, while maintaining all other variables. The real dataset I am working with is much larger and has many more variables within this category so I would like to not locate and type every single variable for the purposes of the function.
So far I have tried derivatives of
df <- df %>% select(matches(".*_p(\\d|X|Y)$")) %>%
set_value_labels(c("Yes" = 1, "No" = 0))
set_value_labels(df, select(df, matches(".*_p(\\d|X|Y)"))), c("Yes" = 1, "No" = 0)
set_value_labels(select(df, matches(".*_p(\\d|X|Y)"))), c("Yes" = 1, "No" = 0)
but can't get the function to accept the selected variables. Typically it throws an error that says "some variables are not found in .data" so I imagine the problem is with the way the function is interpreting the results of select.
Any idea of how I can accomplish this? I'm open to using different methods of selecting the variables as long as it is clean and effective but would prefer to stay with the labelled package if possible for the purposes of applying the labels to the dataframe.
To label all variables in a dataset you have to pass the value labels via the .labels=
argument:
library(labelled)
library(dplyr, warn = FALSE)
val_labels(df)
#> $V1
#> NULL
#>
#> $V2_p1
#> NULL
#>
#> $V2_p2
#> NULL
#>
#> $V3_pX
#> NULL
#>
#> $V4_pY
#> NULL
#>
#> $V5
#> NULL
df <- df %>%
select(matches(".*_p(\\d|X|Y)$")) %>%
set_value_labels(.labels = c("Yes" = 1, "No" = 0))
val_labels(df)
#> $V2_p1
#> Yes No
#> 1 0
#>
#> $V2_p2
#> Yes No
#> 1 0
#>
#> $V3_pX
#> Yes No
#> 1 0
#>
#> $V4_pY
#> Yes No
#> 1 0