I have a large dataframe df1
with string values in multiple columns:
df1 <-
data.frame(col1 = rep(c("A", "B", "C"),3),
col2 = rep(c("C", "A", "B"),3),
col3 = 1:9)
col1 col2 col3
1 A C 1
2 B A 2
3 C B 3
4 A C 4
5 B A 5
6 C B 6
7 A C 7
8 B A 8
9 C B 9
I want to replace some of the string values with alternate values.
I have a second dataframe df2
with the values to be changed col1
and the alternate values col2
.
df2 <- data.frame(col1 = c("A", "B"),
col2 = c("D", "E"))
col1 col2
1 A D
2 B E
So in this example I want all instances of "A" and "B" appearing in df1
to be replaced with "D" and "E" respectively, as per df2
.
The final output would look like:
col1 col2 col3
1 D C 1
2 E D 2
3 C E 3
4 D C 4
5 E D 5
6 C E 6
7 D C 7
8 E D 8
9 C E 9
I have tried using code using across
and lapply
but I am having trouble linking to the second dataframe, where normally I would use a join.
You could make use of the superseded function recode
: This is one of the uses that I am unable to replicate using case_match/ case_when
. It was easy to use. Note that you could reverse the df
ie rev(df2)
and use fct_recode
which is not superseded
yet.
df1 %>%
mutate(across(col1:col2, ~recode(.x, !!!deframe(df2))))
col1 col2 col3
1 D C 1
2 E D 2
3 C E 3
4 D C 4
5 E D 5
6 C E 6
7 D C 7
8 E D 8
9 C E 9
NB: If any knows how to replicate the same using case_match/case_when
please go ahead add the solution
you could also use str_replace_all
Though It works in this scenario, it is not advisable since it might end up replacing portions of the strings instead of the whole string:
df1 %>%
mutate(across(col1:col2, ~str_replace_all(.x, deframe(df2))))
col1 col2 col3
1 D C 1
2 E D 2
3 C E 3
4 D C 4
5 E D 5
6 C E 6
7 D C 7
8 E D 8
9 C E 9