library(tidyverse)
library(forcats)
I have two simple dataframes (code at bottom) and I want to create a new recoded variable by collapsing the "Animal" column. I usually do this with forcats::fct_collapse
. However, I want to make a function to apply fct_collapse to many different dataframes that have the same variables, except that some might be missing one or two of the factor levels. For example, in this case, Df2 is missing "Rhino".
Is there a way I can change the code (using pkg:tidyverse) so that factor categories that are missing will be returned as NA
? In this example I know it's "Rhino", but in my real data there may be other missing levels. I'm open to other options besides forcats::fct_collapse
, but I would like to stay within the realm of tidyverse.
REC <- function(Df, Data){
Df %>%
mutate(NEW = fct_collapse(Data, One = c("Cat","Dog","Snake"),
Two = c("Elephant","Bird","Rhino")))
}
REC(Df1,Animal) - this works
REC(DF2,Animal) - this doesn't, it throws an error because of "Rhino"
Sample Data:
Animal <- c("Cat","Dog","Snake","Elephant","Bird","Rhino")
Code <- c(101,222,434,545,444,665)
Animal2 <- c("Cat","Dog","Snake","Elephant","Bird")
Code2 <- c(101,222,434,545,444)
Df1 <- data_frame(Code, Animal)
Df2 <- data_frame(Code2, Animal2) %> %rename(Animal = Animal2)
Here is one idea for you. I initially tried to have two arguments in my function. One was for a data frame, and the other was a column including animal names. But this attempt failed. I had an error message saying, "Error in mutate_impl(.data, dots) : Column new
must be length 5 (the number of rows) or one, not 6." So I decided not to have the column name in the function; I clearly said Animal
in my function. Then, things worked. The idea was to create a factor variable with missing animal names. That was done in factor()
with setdiff()
. Once I had all animals names, I used fct_collapse()
.
myfun <- function(mydf){
animals <- c("Cat", "Dog", "Snake", "Elephant", "Bird", "Rhino")
mydf %>%
mutate(new = factor(Animal, levels = c(unique(Animal), setdiff(animals, Animal))),
new = fct_collapse(new, One = c("Cat", "Dog", "Snake"),
Two = c("Elephant", "Bird", "Rhino"))) -> x
x}
> myfun(Df2)
# A tibble: 5 x 3
Code2 Animal new
<dbl> <chr> <fct>
1 101 Cat One
2 222 Dog One
3 434 Snake One
4 545 Elephant Two
5 444 Bird Two
> myfun(Df1)
# A tibble: 6 x 3
Code Animal new
<dbl> <chr> <fct>
1 101 Cat One
2 222 Dog One
3 434 Snake One
4 545 Elephant Two
5 444 Bird Two
6 665 Rhino Two
Memo: The following function is the same except that I have two arguments. This is not working. If any revision is possible, please let me know.
myfun2 <- function(mydf, mycol){
animals <- c("Cat", "Dog", "Snake", "Elephant", "Bird", "Rhino")
mydf %>%
mutate(new = factor(mycol, levels = c(unique(mycol), setdiff(animals, mycol))),
new = fct_collapse(new, One = c("Cat", "Dog", "Snake"),
Two = c("Elephant", "Bird", "Rhino"))) -> x
x}
> myfun2(Df2, Animal)
Error in mutate_impl(.data, dots) :
Column `new` must be length 5 (the number of rows) or one, not 6