rdplyrr-factormutate

Specify levels when `mutate`-ing dataframe variables to factors


Let's say I have the following tibble dataframe called data:

library(tibble)

data <- tribble(
    ~"ID", ~"some factor", ~"some other factor", 
    1L, "low", "high",
    2L, "very high", "low",
    3L, "very low", "low",
    4L, "high", "very high",
    5L, "very low", "very low"
)

I use the fct() function in forcats to convert my two factor variables accordingly:

library(dplyr)
library(forcats)

data <- data %>%
        mutate(across(starts_with("some"), fct))

Which gives me:

# A tibble: 5 × 3
     ID `some factor` `some other factor`
  <int> <fct>         <fct>              
1     1 low           high               
2     2 very high     low                
3     3 very low      low                
4     4 high          very high          
5     5 very low      very low 

However, when I call fct this way it's unclear to me how to specify the levels of this ordinal variable. The order I would like is:

order <- c("very low", "low", "high", "very high")

How should I do this with dplyr's set of functions? The goal is to have ggplot2 visualizations that respect this ordering.


Solution

  • When you use across() you can pass extra arguments along to the called function through across's ....

    data <- data %>%
      mutate(across(starts_with("some"), fct, levels = order))
    

    This is equivalent to

    data <- data %>%
      mutate(across(starts_with("some"), function(x) fct(x, levels = order)))
    

    (This is a common paradigm in R, many functions where you are applying a function have a ... argument for arguments that will be passed along to the applied function, see also lapply, sapply, purrr::map, etc.)