rstringleft-joinmultiple-instances

R: replace multiple occurrences of regex-matched strings in dataframe fields by looking them up in another dataframe


I have two dataframes:

df lookup:

oldId <- c(123, 456, 567, 789)
newId <- c(1, 2, 3, 4)
lookup <- data.frame(oldId, newId)

df data:

descr <- c("description with no match",
+ "description with one 123 match", 
+ "description with again no match",
+ "description 456 with two 789 matches")

Goal:

I want a new dataframe:

The resulting dataframe will thus look like this:

  1. "description with no match"
  2. "description with one 1 match"
  3. "description with again no match"
  4. "description 2 with two 4 matches"

So, each text in the descr column may have a large amount of numbers which need to be replaced. Of course, this is a stripped down example; my real life dataframes are much bigger.

I do have the regex-part fixed:

fx <- function(x) {gsub("([[:digit:]]{3})", "TESTTEST", x)}
data$descr <- lapply(data$descr, fx)

But I have no idea how to let the function loop over all matches in a row, and then let it look up the number and replace it.


Solution

  • A base R approach can use Reduce:

    Reduce(
      \(x, i) gsub(lookup$oldId[i], lookup$newId[i], x),
      seq_along(lookup$oldId),
      init = descr
    )
    

    Output:

    [1] "description with no match"        "description with one 1 match"    
    [3] "description with again no match"  "description 2 with two 4 matches"