rstringr

Applying a rowwise str_split doesn't work


I have a character column containing email adresses in one single string (separated by comma). I have another character column that also contains such email addresses.

I now want to split up the character string in the first column (i.e. creating a vector of email addresses) and then check if any of these appear in the second column.

I thought this is an easy task for stringr functions, but it seems they are not vetorized in a way I thought they would be.

I know how I can achieve my task with other workarounds (e.g. separating longer etc.), but I'm interested in one single call without reshaping anything.

Data:

df <- structure(list(a = c("a@test.com, b@test.com", "x@test.com, y@test.com"),
                     b = c("b@test.com, c@test.com", "d@test.com, e@test.com")),
                class = "data.frame",
                row.names = c(NA, -2L))

My code:

df |>
  mutate(test = any(str_detect(a, str_split_1(b, ", "))))

Error in `mutate()`:
ℹ In argument: `test = any(str_detect(a, str_split_1(b, ", ")))`.
Caused by error in `str_split_1()`:
! `string` must be a single string, not a character vector.
Run `rlang::last_trace()` to see where the error occurred.

It seems str_split_1 doesn't recognize the single string per row and instead is taking the character vector of the whole column.

Expected output:

                       a                      b  test
1 a@test.com, b@test.com b@test.com, c@test.com  TRUE
2 x@test.com, y@test.com d@test.com, e@test.com FALSE

Solution

  • You could split the string, and then try detect each sub-string in a:

    df %>%
     mutate(test = map_lgl(.x = str_split(b, ", "), ~ any(str_detect(a, str_escape(.x)))))
    
                           a                      b  test
    1 a@test.com, b@test.com b@test.com, c@test.com  TRUE
    2 x@test.com, y@test.com d@test.com, e@test.com FALSE
    

    Or could replace the separator with | and search for each sub-string in a:

    df %>%
     mutate(test = str_detect(a, str_replace(str_escape(b), ", ", "|")))