rmatchingfuzzyagrep

Fuzzy mapping in R


I am trying to use agrep command for fuzzy matching. I have a data frame in which one column contains the audience response and another dataframe in which segment and subsegment are listed. the column audience response contains the words that are the name of the subsegment. For example:

pattern$audience
[1] "(Deleted) Semasio » DE: Intent » Christmas Shopping"          
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"      
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"        
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers"
[5] "(Old) AddThis - UK » Food » Social"                           
[6] "(Old) AddThis - UK » Health » Social » Health Influencers" 

Similarly I have another data frame called x that conatins the segment and sub-segment

x$segment               x$subsegment
Shopping                Financial shoppers
Travel                  Travel Europe
Shopping                Christmas shopping

I want to write a function that does the fuzzy matching between pattern$Audience and x$subsegment and returns the subsegment for each of the audience response in a new column as pattern$subseg

The resulting data set I need should be like this:

pattern$audience    x$segment               x$subsegment                
[1] "(Deleted) Semasio » DE: Intent » Christmas C"            Shopping                Christmas shopping              
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"                         
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"                           
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers"   Shopping                Financial shoppers              
[5] "(Old) AddThis - UK » Food » Social"                                              
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"                  

Here's the code that I tried to write but it is not returning me the desired output:

x <- rename(x, c("Segment" = "segment", "Sub Segment" = "subseg"))
names(x)
y <- as.data.frame(x$subseg)
y <- rename(y, c("x$subseg" = "subseg"))


n.match <- function(pattern, x, ...) {
  for (i in 1:nrow(pattern)) {
        x <- (agrep(y,pattern$audience[i],
                 ignore.case=TRUE, value = TRUE))
              x <- paste0(x,"")
              pattern$subseg[i] <- x
  }
  head(pattern)
    }

Can someone please help me correct my mistake. I would really appreciate your answer. Many thanks


Solution

  • We could try this:

    pattern <- c("(Deleted) Semasio » DE: Intent » Christmas C",          
             "(Old) AddThis - UK » Auto » General » Auto Enthusiasts",
             "(Old) AddThis - UK » Auto » General » Auto Intenders",        
             "(Old) AddThis - UK » Financial » Social » Financial Shoppers",
             "(Old) AddThis - UK » Food » Social",
             "(Old) AddThis - UK » Financial » Social » Financial Shoppers",
             "(Old) AddThis - UK » Health » Social » Health Influencers")
    pattern <- data.frame(audiance=pattern)
    x <- read.csv(text='segment,   subsegment    
                           Shopping,   Financial shoppers
                           Travel,     Travel Europe
                           Enthusiasts, Auto Enthusiasts  
                           Shopping,   Christmas shopping', stringsAsFactors=FALSE)
    
    vagrep <- Vectorize(agrep, 'pattern', SIMPLIFY = TRUE)
    pattern$subsegment <- ''
    matches <- vagrep(x$subsegment, pattern$audiance)
    invisible(lapply(1:length(matches), function(i) if (length(matches[[i]] > 0)) pattern$subsegment[matches[[i]]] <<- x$subsegment[i]))
    
    pattern
    #                                                         audiance            subsegment
    #1                  (Deleted) Semasio » DE: Intent » Christmas C                      
    #2       (Old) AddThis - UK » Auto » General » Auto Enthusiasts    Auto Enthusiasts  
    #3         (Old) AddThis - UK » Auto » General » Auto Intenders                      
    #4 (Old) AddThis - UK » Financial » Social » Financial Shoppers    Financial shoppers
    #5                            (Old) AddThis - UK » Food » Social                      
    #6 (Old) AddThis - UK » Financial » Social » Financial Shoppers    Financial shoppers
    #7    (Old) AddThis - UK » Health » Social » Health Influencers