My Google-fu must be failing as I feel someone must've had this problem before and I can't find a solution.
Let's say I have the following data:
df1 <- data.frame(Name = c("Banana", "Apple", "Sponge", "Donut", "Bleach", "Wine"),
Value = c("Fruit", "Fruit", "Cleaner", "Dessert", "Cleaner", "Party"))
df2 <- data.frame(Name = c("Ban", "Ap", "Do", "Wi"),
Value = c("F", "F", "D", "P"))
I need to be able to do a fuzzy string match on df1$Name with df2$Name, and where they match replace df1$Value with df2$Value where $Name fuzzy matches. If there is no match, I want to retain the value from df1. Such that, my output would look like this:
Name | Value |
---|---|
Banana | F |
Apple | F |
Sponge | Cleaner |
Donut | D |
Bleach | Cleaner |
Wine | P |
I want anything starting with "Ban" to be F. I don't care if it says Bananakin or anything after df2$Name. The number of rows in df1 is in the thousands, df2 is merely 67.
Further, df1 has additional columns that aren't relevant but I would like to keep.
The closest code I've found to what I want is this:
df3 <- df1 %>% mutate(across(c(Name), ~if_else(str_detect(Name, df2$Name), str_replace(Value .,df2$Name)))
However I get the following error:
Error in
mutate()
: ℹ In argument:across(...)
. Caused by error instr_detect()
: ! Can't recyclestring
(size 1362) to matchpattern
(size 67). Runrlang::last_trace()
to see where the error occurred.
I get the same error for this bit of code as well:
df1$Value <- str_replace_all(df1$Value, df1$Name == df2$Name, df2$Value)
I've also tried:
df1 <- df1 %>% mutate(across(everything(), ~deframe(df2[.])))
What am I missing? I have a feeling it's something simple I'm just not seeing. I'm relatively new to R, and would greatly appreciate the help!
df1 %>%
mutate(Name1 = Name %>%
str_replace_all(set_names(df2$Value, str_c(df2$Name, ".*"))),
Value = ifelse(Name == Name1, Value, Name1),
Name1 = NULL)
Name Value
1 Banana F
2 Apple F
3 Sponge Cleaner
4 Donut D
5 Bleach Cleaner
6 Wine P
fuzzyjoin::regex_left_join(df1, df2, 'Name') %>%
reframe(Name = Name.x, Value = coalesce(Value.y, Value.x))
Name Value
1 Banana F
2 Apple F
3 Sponge Cleaner
4 Donut D
5 Bleach Cleaner
6 Wine P