In the dplyr package, recode()
has been superseded in favor of case_match()
. Is there a way to use labels stored in, for example, char array to recode values using case_match()
?
For example, with recode()
I can store labels in a char array (or read them from a CSV file) and use them for recoding:
lbls <- c(
'male' = 'Man',
'female' = 'Woman'
)
starwars %>%
select( sex ) %>%
mutate(
sex = recode( sex, !!!lbls )
)
# A tibble: 87 × 1
# sex
# <chr>
# 1 Man
# 2 none
# 3 none
# 4 Man
# 5 Woman
# ...
However, since case_match()
requires two-sided formulas (old_values ~ new_value
), that does not work. Is there a way to use stored values also in case_match()
?
You can create a set of rules to be evaluated.
tidyverse
approachAs you're using dplyr
let's go all in:
(rules <- glue::glue('"{lbl}" ~ "{val}"', lbl = names(lbls), val = lbls))
# "male" ~ "Man"
# "female" ~ "Woman"
You can then turn this character vector into a list of call
objects with rlang::parse_exprs()
. Then inject the list into the function call as arguments using the splice operator, !!!
:
starwars |>
select(sex) |>
mutate(
sex = case_match(
sex,
!!!rlang::parse_exprs(rules),
.default = sex
)
)
# # A tibble: 87 × 1
# sex
# <chr>
# 1 Man
# 2 none
# 3 none
# 4 Man
# 5 Woman
# 6 Man
# 7 Woman
# 8 none
# 9 Man
# 10 Man
# # ℹ 77 more rows
# # ℹ Use `print(n = ...)` to see more rows
We can also do the parsing and splicing in base R. For me it's a little clearer what's going on. We can define rules with sprintf()
instead of glue
, as suggested by Darren Tsai.
rules <- c(
"sex",
sprintf('"%s" ~ "%s"', names(lbls), lbls)
)
To get the character vector into a list of language objects, instead of parse_exprs()
we can use str2lang()
. Then !!!
is a way of applying case_match()
to a list of arguments, i.e. the equivalent of do.call()
.
starwars |>
select(sex) |>
mutate(
sex = do.call(
case_match,
c(
lapply(rules, str2lang),
list(.default = sex)
)
)
)
# # A tibble: 87 × 1
# sex
# <chr>
# 1 Man
# 2 none
# 3 none
# 4 Man
# 5 Woman
# <etc>
.default
Note that unlike recode
, we need to provide case_match()
with the .default
parameter:
The value used when values in
.x
aren't matched by any of the LHS inputs. IfNULL
, the default, a missing value will be used.
If this is not provided, any value not specified (e.g. "none"
) becomes NA