I'm working with package version 0.4-12 and R version 4.0.0
My data linkage code that I have used in the past in no longer running the same as it did when I had R version 3.6.3
library(tidyverse)
library(RecordLinkage)
data("RLdata500")
data("RLdata10000")# Creating package datasets to link; dat1 and dat2
dat1 <- RLdata500
dat2 <- bind_rows(RLdata500, RLdata10000)
The code for the two linkages below are identical except for the strcmpfun argument which is either set to "jarowinkler" or "levenshtein."
The "levenshtein" code runs fine, but the jarowinkler" linkage fails to produce any results for "allpairs_jw."
# Jaro-Winkler with Package data
rpairs <- RLBigDataLinkage(dat1, dat2,
strcmp = TRUE,
strcmpfun = "jarowinkler",
exclude = c("fname_c2", "lname_c2"))
epi <- epiWeights(rpairs)
allpairs_jw <- getPairs(epi, min.weight = 0.80)
# Levenshtein with Package data
rpairs <- RLBigDataLinkage(dat1, dat2,
strcmp = TRUE,
strcmpfun = "levenshtein",
exclude = c("fname_c2", "lname_c2"))
epi <- epiWeights(rpairs)
allpairs_lv <- getPairs(epi, min.weight = 0.80)
> head(allpairs_jw)
[1] id fname_c1 fname_c2 lname_c1 lname_c2 by bm bd is_match
<0 rows> (or 0-length row.names)
> head(allpairs_lv)
id fname_c1 fname_c2 lname_c1 lname_c2 by bm bd is_match Weight
1 1 CARSTEN <NA> MEIER <NA> 1949 7 22
2 1 CARSTEN <NA> MEIER <NA> 1949 7 22 <NA> 1.0000000
3
4 2 GERD <NA> BAUER <NA> 1968 7 27
5 2 GERD <NA> BAUER <NA> 1968 7 27 <NA> 1.0000000
6
Any guidance would be greatly appreciated
There was an apparent bug in the underlying code. The package administrator has fixed the issue and pushed it off to CRAN. I've tested the updated package against other record linkage packages and it seems to be working perfectly in my current environment.
R version 4.0.2 (2020-06-22) RStudio Version 1.3.959