rstringrmutate

How to mutate all values except for a vector of selected values with case_when()


I'm cleaning a list of business names and I'm struggling to selectively convert the cases to title case. I can use the mutate(str_to_title(...)) functions to convert the whole field to title case, and that works great for most of my values, but there are a handful that are titled like "ABC Company" or "John Doe Company LLC", and when I apply title case, that messes their proper cases up ("Abc Company" and "John Doe Company Llc").

I thought I could use case_when() and a vector of specific values to create a function that tells R to only apply title case to values that do not equal the vector of values I specify. However, I either come up with a warning that "longer object length is not a multiple of shorter object length", and all the values are converted to title case, or I simply get NAs for the vector values in my field and correct title case values for the values not in my vector. Where am I going wrong?

# Example Code #

library(tidyverse)

## Reproducible Example ##

test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC", 
"rainbow road company", "yellow brick road incorporated", "XYZ", 
"Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -7L))

test<-test%>%
  mutate(`Company Name`= case_when(`Company Name`!= c("ABC Company","John Doe Company LLC","XYZ") ~ str_to_title(`Company Name`)))

# Error #
Warning message:
There was 1 warning in `mutate()`.
ℹ In argument: `Company Name = case_when(...)`.
Caused by warning in `` `Company Name` != c("ABC Company", "John Doe Company LLC", "XYZ") ``:
! longer object length is not a multiple of shorter object length 

Solution

  • However, I either come up with a warning that "longer object length is not a multiple of shorter object length", and all the values are converted to title case, or I simply get NAs for the vector values in my field and correct title case values for the values not in my vector. Where am I going wrong?

    When you mutate Company Name with "case_when()" you need so specify a default case like this:

    case_when( 
        !(`Company Name`%in% c("ABC Company","John Doe Company LLC","XYZ")) ~ str_to_title(`Company Name`), # ! inverts the case, so if the vector values are not in Company Name
        .default = `Company Name`
    )
    

    Since it was missing in your example, there is no default if your case 1 does not apply and therefore the rest is filled with NA-values.

    Alternatively you can use a function that only capitalizes strings which start with a lower case, which prevents the need of defining exceptions in the first place. I included both examples below :)

    library(tidyverse)
    
    ## Reproducible Example ##
    
    test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC", 
                                            "rainbow road company", "yellow brick road incorporated", "XYZ", 
                                            "Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df", 
                                                                                                                "tbl", "data.frame"), row.names = c(NA, -7L))
    
    # Function to capitalize words selectively
    capitalize_words <- function(input_string) {
      str_replace_all(input_string, "\\b[a-z][a-z]*\\b", function(word) {
        str_to_title(word)
      })
    }
    
    
    
    test<-test%>%
      mutate(`Capitalized Company Names case when`= case_when( !(`Company Name`%in% c("ABC Company","John Doe Company LLC","XYZ")) ~ str_to_title(`Company Name`), .default = `Company Name`),
             `Capitalized Company Names with function` = capitalize_words(`Company Name`))
    

    and end up with this result:

    enter image description here