I have two data frames I would like to merge
a<- data.frame(x=c(1,4,6,8,1,6,7,2),ID=c("132","14.","732","2..","132","14.","732","2.."),year=c(1,1,1,1,2,2,2,2))
b<- data.frame(y=c(2,7,5,5,1,1,2,3),ID=c("132","144","732","290","132","144","732","290"),year=c(1,1,1,1,2,2,2,2))
The ID Variable by which I would like to merge the two data frames is not completely known in data set a. I also want to merge by year. They are known up to a fully identifying regular expression. Note, there is a one-to-one match. In this example you would not find an ID "1.." in the data set so there are no ambiguous matches.
I would like to get something like this:
output<-data.frame(y=c(2,7,5,5,1,1,2,3),x=c(1,4,6,8,1,6,7,2),ID=c("132","144","732","290","132","144","732","290"), year=c(1,1,1,1,2,2,2,2))
I tried to remove the regular expression part with substr and then use starts_with in the merge but it does not work.
I get the following error message
Coercing pattern to a plain character vector
when I do this:
df_complete <- regex_inner_join(b,a, by=c("ID","year"))
Thank you stack overflow...
Answer from Comment by @jblood94
With a
and b
as data.table
s: a[, regex_inner_join(b[year == .BY], .SD, by = "ID"), year]
–
jblood94