dplyrforcats

drop unused levels for all variables in a data frame using dplyr


I would like to use dplyr instead of rbase to drop unused levels of a factor in 'R'. This can be done using fct_drop. However, I have to do this manually for each single variable in my dataframe. Is it possible to do it for all variables at once?

Thanks in forward

x <- data.frame(region=factor(c('P1', 'P2', 'P3', 'P4', 'P5')),
            country=factor(c('C1', 'C2', 'C3', 'C4', 'C5')),
                  sales = c(103, 106, 202, 257, 324))
x1 <- filter(x, sales <250)
fct_drop(x1$region)

Solution

  • You can do this nicely with dplyr's mutate(across()) setup, applying to all columns which are factors:

    library(tidyverse)
    
    x <- data.frame(region=factor(c('P1', 'P2', 'P3', 'P4', 'P5')),
                    country=factor(c('C1', 'C2', 'C3', 'C4', 'C5')),
                    sales = c(103, 106, 202, 257, 324))
    x1 <- filter(x, sales <250)
    
    x1 |> 
      mutate(across(where(is.factor), fct_drop)) |> 
      str()
    #> 'data.frame':    3 obs. of  3 variables:
    #>  $ region : Factor w/ 3 levels "P1","P2","P3": 1 2 3
    #>  $ country: Factor w/ 3 levels "C1","C2","C3": 1 2 3
    #>  $ sales  : num  103 106 202