rdplyrrecode

Recode values using case_match() with a char array


In the dplyr package, recode() has been superseded in favor of case_match(). Is there a way to use labels stored in, for example, char array to recode values using case_match()?

For example, with recode() I can store labels in a char array (or read them from a CSV file) and use them for recoding:

lbls <- c(
    'male' = 'Man',
    'female' = 'Woman'
)

starwars %>%
    select( sex ) %>%
    mutate(
        sex = recode( sex, !!!lbls )
    )

# A tibble: 87 × 1
#   sex  
#   <chr>
# 1 Man  
# 2 none 
# 3 none 
# 4 Man  
# 5 Woman
# ...

However, since case_match() requires two-sided formulas (old_values ~ new_value), that does not work. Is there a way to use stored values also in case_match()?


Solution

  • You can create a set of rules to be evaluated.

    tidyverse approach

    As you're using dplyr let's go all in:

    (rules <- glue::glue('"{lbl}" ~ "{val}"', lbl = names(lbls), val = lbls))
    # "male" ~ "Man"
    # "female" ~ "Woman"
    

    You can then turn this character vector into a list of call objects with rlang::parse_exprs(). Then inject the list into the function call as arguments using the splice operator, !!!:

    starwars |>
        select(sex) |>
        mutate(
            sex = case_match(
                sex,
                !!!rlang::parse_exprs(rules),
                .default = sex
            )
        )
    # # A tibble: 87 × 1
    #    sex  
    #    <chr>
    #  1 Man  
    #  2 none 
    #  3 none 
    #  4 Man  
    #  5 Woman
    #  6 Man  
    #  7 Woman
    #  8 none 
    #  9 Man  
    # 10 Man  
    # # ℹ 77 more rows
    # # ℹ Use `print(n = ...)` to see more rows
    

    base R approach

    We can also do the parsing and splicing in base R. For me it's a little clearer what's going on. We can define rules with sprintf() instead of glue, as suggested by Darren Tsai.

    rules <- c(
        "sex",
        sprintf('"%s" ~ "%s"', names(lbls), lbls)
    )
    

    To get the character vector into a list of language objects, instead of parse_exprs() we can use str2lang(). Then !!! is a way of applying case_match() to a list of arguments, i.e. the equivalent of do.call().

    starwars |>
        select(sex) |>
        mutate(
            sex = do.call(
                case_match,
                c(
                    lapply(rules, str2lang),
                    list(.default = sex)
                )
            )
        )
    # # A tibble: 87 × 1
    #    sex
    #    <chr>
    #  1 Man
    #  2 none
    #  3 none
    #  4 Man
    #  5 Woman
    #  <etc>
    

    A note on .default

    Note that unlike recode, we need to provide case_match() with the .default parameter:

    The value used when values in .x aren't matched by any of the LHS inputs. If NULL, the default, a missing value will be used.

    If this is not provided, any value not specified (e.g. "none") becomes NA