rdplyrfuzzy-search

how to find if text strings in one column are in another column?


Below is the sample data

 df1 <- c ("Board of Accountancy", "Board of Economists", "Board of Medicine"
 df2 <- c ("State Board of Accountancy", "The State Board of Economists", "State Board of Law")

the task at hand is two fold. First, to search df2 for the text strings found in df1. If it is not found in df1 then leave it alone and come to an end result such as this. This is related to a question that I made yesterday but upon closer examination.. my first job is to find if the names in df1 are found in df2.

df3: "State Board of Accountancy", "The State Board of Economists", "State Board of Law", "Board of Medicine"

Solution

  • c(df2, df1[rowSums(sapply(df1, grepl, df2)) < 1])
    # [1] "State Board of Accountancy"    "The State Board of Economists" "State Board of Law"            "Board of Medicine"            
    df3
    # [1] "State Board of Accountancy"    "The State Board of Economists" "State Board of Law"            "Board of Medicine"            
    

    Walk-through:


    Corrected data:

    df1 <- c("Board of Accountancy", "Board of Economists", "Board of Medicine")
    df2 <- c("State Board of Accountancy", "The State Board of Economists", "State Board of Law")
    df3 <- c("State Board of Accountancy", "The State Board of Economists", "State Board of Law", "Board of Medicine")