rdplyrtidyverse

Check if element of one dataframe is in another dataframe, within group


Say that I have these data:

library(dplyr)
df1 <- data.frame(x = c(1, 2, 3, 4), z = c("A", "A", "B", "B"))
df2 <- data.frame(x = c(2, 4, 6, 8), z = c("A", "A", "B", "C"))

I can easily check if each element of x in df1 is present in x of df2:

df1 <- df1 %>% mutate(present = x %in% df2$x)

Is there an easy way to do the same thing (preferable in the tidyverse), but to only check within group?

In other words, for an observation in df1 to have present be TRUE, two things must be true: 1) the group (z) in df2 must be the same as the group in df1 and 2) the value of x in df2 must be the same as the value in df1.

So, only the second observation (2) would be TRUE because there exists an observation in df2 with an x of 2 and a z of A. The last observation of x would be FALSE because even though there is a value in df2 with value 4, this observation is in group A, not B.


Solution

  • This works on your example data, though it seems inelegant.

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    df1 <- data.frame(x = c(1, 2, 3, 4), z = c("A", "A", "B", "B"))
    df2 <- data.frame(x = c(2, 4, 6, 8), z = c("A", "A", "B", "C"))
    df1 |> rowwise() |> mutate(present = x %in% df2[df2$z == z, "x"])
    #> # A tibble: 4 × 3
    #> # Rowwise: 
    #>       x z     present
    #>   <dbl> <chr> <lgl>  
    #> 1     1 A     FALSE  
    #> 2     2 A     TRUE   
    #> 3     3 B     FALSE  
    #> 4     4 B     FALSE
    

    Created on 2024-11-30 with reprex v2.1.1