I'm cleaning a list of business names and I'm struggling to selectively convert the cases to title case. I can use the mutate(str_to_title(...))
functions to convert the whole field to title case, and that works great for most of my values, but there are a handful that are titled like "ABC Company" or "John Doe Company LLC", and when I apply title case, that messes their proper cases up ("Abc Company" and "John Doe Company Llc").
I thought I could use case_when()
and a vector of specific values to create a function that tells R to only apply title case to values that do not equal the vector of values I specify. However, I either come up with a warning that "longer object length is not a multiple of shorter object length", and all the values are converted to title case, or I simply get NAs for the vector values in my field and correct title case values for the values not in my vector. Where am I going wrong?
# Example Code #
library(tidyverse)
## Reproducible Example ##
test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC",
"rainbow road company", "yellow brick road incorporated", "XYZ",
"Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -7L))
test<-test%>%
mutate(`Company Name`= case_when(`Company Name`!= c("ABC Company","John Doe Company LLC","XYZ") ~ str_to_title(`Company Name`)))
# Error #
Warning message:
There was 1 warning in `mutate()`.
ℹ In argument: `Company Name = case_when(...)`.
Caused by warning in `` `Company Name` != c("ABC Company", "John Doe Company LLC", "XYZ") ``:
! longer object length is not a multiple of shorter object length
However, I either come up with a warning that "longer object length is not a multiple of shorter object length", and all the values are converted to title case, or I simply get NAs for the vector values in my field and correct title case values for the values not in my vector. Where am I going wrong?
When you mutate Company Name
with "case_when()" you need so specify a default case like this:
case_when(
!(`Company Name`%in% c("ABC Company","John Doe Company LLC","XYZ")) ~ str_to_title(`Company Name`), # ! inverts the case, so if the vector values are not in Company Name
.default = `Company Name`
)
Since it was missing in your example, there is no default if your case 1 does not apply and therefore the rest is filled with NA-values.
Alternatively you can use a function that only capitalizes strings which start with a lower case, which prevents the need of defining exceptions in the first place. I included both examples below :)
library(tidyverse)
## Reproducible Example ##
test<-structure(list(`Company Name` = c("ABC Company", "John Doe Company LLC",
"rainbow road company", "yellow brick road incorporated", "XYZ",
"Mostly Ghostly Company", "hot Leaf juice tea company")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -7L))
# Function to capitalize words selectively
capitalize_words <- function(input_string) {
str_replace_all(input_string, "\\b[a-z][a-z]*\\b", function(word) {
str_to_title(word)
})
}
test<-test%>%
mutate(`Capitalized Company Names case when`= case_when( !(`Company Name`%in% c("ABC Company","John Doe Company LLC","XYZ")) ~ str_to_title(`Company Name`), .default = `Company Name`),
`Capitalized Company Names with function` = capitalize_words(`Company Name`))
and end up with this result: