ragrep

duplicates in agrep function


I have the following code:

x <- data.frame("SN" = 1:2, "Name" = c("aaa","bbb"))

y <- data.frame("SN" = 1:2,  "Name" = c("aa1","aa2"))

x$partials<- as.character(sapply(x$Name, agrep, y$Name,max.distance = 1,value=T))

x

The output is the following:

    > x
  SN Name        partials
1  1  aaa c("aa1", "aa2")
2  2  bbb    character(0)

However I am expecting the following output:

enter image description here

Any ideas?


Solution

  • You are probably looking for this.

    First, the sapply() puts out nothing if character(0). To prevent this, you could say it's NA or text "character(0)" if you really want this.

    z <- setNames(sapply(x$Name, function(a) {
      ag <- agrep(a, y$Name, max.distance=1, value=TRUE)
      if (identical(ag, character(0))) NA  # set to "character(0)" at will
      else ag
      }), x$Name)
    

    Then, transform the list you get into a matrix.

    z <- do.call(rbind, z)
    

    We need to melt() it to get the right format. A good way is with data.table.

    library(data.table)
    z <- setNames(melt(z)[-2], c("Name", "partials"))
    

    Now, we just merge x with the new data to get the result, ensuring unique rows of z.

    res <- merge(x, unique(z))[c(2, 1, 3)]
    
    > res
      SN Name partials
    1  1  aaa      aa1
    2  1  aaa      aa2
    3  2  bbb     <NA>