rtidyverseforcats

How to Use forcats::fct_collapse in a Function Across Different Dataframes with Different Factor Levels


library(tidyverse)
library(forcats)

I have two simple dataframes (code at bottom) and I want to create a new recoded variable by collapsing the "Animal" column. I usually do this with forcats::fct_collapse. However, I want to make a function to apply fct_collapse to many different dataframes that have the same variables, except that some might be missing one or two of the factor levels. For example, in this case, Df2 is missing "Rhino".

Is there a way I can change the code (using pkg:tidyverse) so that factor categories that are missing will be returned as NA? In this example I know it's "Rhino", but in my real data there may be other missing levels. I'm open to other options besides forcats::fct_collapse, but I would like to stay within the realm of tidyverse.

REC <- function(Df, Data){

Df %>% 
mutate(NEW = fct_collapse(Data, One = c("Cat","Dog","Snake"),
                          Two = c("Elephant","Bird","Rhino")))
}

REC(Df1,Animal) - this works
REC(DF2,Animal) - this doesn't, it throws an error because of "Rhino"

Sample Data:

Animal <- c("Cat","Dog","Snake","Elephant","Bird","Rhino")
Code <- c(101,222,434,545,444,665)
Animal2 <- c("Cat","Dog","Snake","Elephant","Bird")
Code2 <- c(101,222,434,545,444)

Df1 <- data_frame(Code, Animal)

Df2 <- data_frame(Code2, Animal2) %> %rename(Animal = Animal2)

Solution

  • Here is one idea for you. I initially tried to have two arguments in my function. One was for a data frame, and the other was a column including animal names. But this attempt failed. I had an error message saying, "Error in mutate_impl(.data, dots) : Column new must be length 5 (the number of rows) or one, not 6." So I decided not to have the column name in the function; I clearly said Animal in my function. Then, things worked. The idea was to create a factor variable with missing animal names. That was done in factor() with setdiff(). Once I had all animals names, I used fct_collapse().

    myfun <- function(mydf){
    
             animals <- c("Cat", "Dog", "Snake", "Elephant", "Bird", "Rhino")
    
             mydf %>%
             mutate(new =  factor(Animal, levels = c(unique(Animal), setdiff(animals, Animal))),
                    new = fct_collapse(new, One = c("Cat", "Dog", "Snake"),
                                           Two = c("Elephant", "Bird", "Rhino"))) -> x
             x}
    
    > myfun(Df2)
    # A tibble: 5 x 3
      Code2 Animal   new  
      <dbl> <chr>    <fct>
    1   101 Cat      One  
    2   222 Dog      One  
    3   434 Snake    One  
    4   545 Elephant Two  
    5   444 Bird     Two  
    
    > myfun(Df1)
    # A tibble: 6 x 3
       Code Animal   new  
      <dbl> <chr>    <fct>
    1   101 Cat      One  
    2   222 Dog      One  
    3   434 Snake    One  
    4   545 Elephant Two  
    5   444 Bird     Two  
    6   665 Rhino    Two  
    

    Memo: The following function is the same except that I have two arguments. This is not working. If any revision is possible, please let me know.

    myfun2 <- function(mydf, mycol){
    
             animals <- c("Cat", "Dog", "Snake", "Elephant", "Bird", "Rhino")
    
             mydf %>%
             mutate(new =  factor(mycol, levels = c(unique(mycol), setdiff(animals, mycol))),
                   new = fct_collapse(new, One = c("Cat", "Dog", "Snake"),
                                           Two = c("Elephant", "Bird", "Rhino"))) -> x
            x}
    
    > myfun2(Df2, Animal)
    Error in mutate_impl(.data, dots) : 
    Column `new` must be length 5 (the number of rows) or one, not 6