I have two tibbles with a list of names and their scores on one of two tests. I want to merge these dataframes so each row consists of a name, and their scores on test one and test two.
df1<-tibble(person=c("Joe", "John", "Albert", "Joe", "John"), test_1=c(90, 91, 92,93, 94 ))
df2<-tibble(person=c("Joe", "John", "Albert", "Joe"), test_2=c(94, 70, 93,50))
df3<-inner_join(df1, df2, by="person")
I would like to get this:
df4<-tibble(person=c("Joe", "Albert", "Joe"), test_1=c(90, 92, 93), test_2=c(94, 93, 94)
However, when I run the first code block, I get four different rows for person "Joe" even though there are only two observations of "Joe" in each dataframe. I only want to "match" two persons with the same name (like "Joe") if their test scores are within five points of each other.
Is there a way to do this? I would appreciate any help!
Note: I accidentally rollbacked this post when trying to edit, which removed someone's answer so I am reposting it. Very sorry to the person whose answer was removed!
Given your expected output then my original post would work. Just filter after doing a many-to-many join:
library(dplyr)
inner_join(df1, df2, by = "person", relationship = 'many-to-many') |>
filter(abs(test_1 - test_2) <= 5 | n() == 1, .by = person)
the .by
is doing the filter by person. So then n()
represents the number of rows within each by group. This additional condition is ensuring that rows with a one-to-one match will be returned no matter how far apart test_1
and test_2
are. For example, if Albert had a test_1 score of 92 and a test_2 score of 0 that row would not be filtered out because there is only one Albert.