rdplyr

Replace multiple string values in dataframe with alternative strings


I have a large dataframe df1 with string values in multiple columns:

df1 <- 
    data.frame(col1 = rep(c("A", "B", "C"),3),
               col2 = rep(c("C", "A", "B"),3),
               col3 = 1:9)

  col1 col2 col3
1    A    C    1
2    B    A    2
3    C    B    3
4    A    C    4
5    B    A    5
6    C    B    6
7    A    C    7
8    B    A    8
9    C    B    9

I want to replace some of the string values with alternate values.

I have a second dataframe df2 with the values to be changed col1 and the alternate values col2.

df2 <- data.frame(col1 = c("A", "B"), 
                  col2 = c("D", "E"))

  col1 col2
1    A    D
2    B    E

So in this example I want all instances of "A" and "B" appearing in df1 to be replaced with "D" and "E" respectively, as per df2.

The final output would look like:

  col1 col2 col3
1    D    C    1
2    E    D    2
3    C    E    3
4    D    C    4
5    E    D    5
6    C    E    6
7    D    C    7
8    E    D    8
9    C    E    9

I have tried using code using across and lapply but I am having trouble linking to the second dataframe, where normally I would use a join.


Solution

  • You could make use of the superseded function recode: This is one of the uses that I am unable to replicate using case_match/ case_when. It was easy to use. Note that you could reverse the df ie rev(df2) and use fct_recode which is not superseded yet.

    df1 %>% 
       mutate(across(col1:col2, ~recode(.x, !!!deframe(df2))))
    
      col1 col2 col3
    1    D    C    1
    2    E    D    2
    3    C    E    3
    4    D    C    4
    5    E    D    5
    6    C    E    6
    7    D    C    7
    8    E    D    8
    9    C    E    9
    

    NB: If any knows how to replicate the same using case_match/case_when please go ahead add the solution


    you could also use str_replace_all Though It works in this scenario, it is not advisable since it might end up replacing portions of the strings instead of the whole string:

    df1 %>% 
      mutate(across(col1:col2, ~str_replace_all(.x, deframe(df2))))
    
      col1 col2 col3
    1    D    C    1
    2    E    D    2
    3    C    E    3
    4    D    C    4
    5    E    D    5
    6    C    E    6
    7    D    C    7
    8    E    D    8
    9    C    E    9