I have a character column containing email adresses in one single string (separated by comma). I have another character column that also contains such email addresses.
I now want to split up the character string in the first column (i.e. creating a vector of email addresses) and then check if any of these appear in the second column.
I thought this is an easy task for stringr
functions, but it seems they are not vetorized in a way I thought they would be.
I know how I can achieve my task with other workarounds (e.g. separating longer etc.), but I'm interested in one single call without reshaping anything.
Data:
df <- structure(list(a = c("a@test.com, b@test.com", "x@test.com, y@test.com"),
b = c("b@test.com, c@test.com", "d@test.com, e@test.com")),
class = "data.frame",
row.names = c(NA, -2L))
My code:
df |>
mutate(test = any(str_detect(a, str_split_1(b, ", "))))
Error in `mutate()`:
ℹ In argument: `test = any(str_detect(a, str_split_1(b, ", ")))`.
Caused by error in `str_split_1()`:
! `string` must be a single string, not a character vector.
Run `rlang::last_trace()` to see where the error occurred.
It seems str_split_1
doesn't recognize the single string per row and instead is taking the character vector of the whole column.
Expected output:
a b test
1 a@test.com, b@test.com b@test.com, c@test.com TRUE
2 x@test.com, y@test.com d@test.com, e@test.com FALSE
You could split the string, and then try detect each sub-string in a
:
df %>%
mutate(test = map_lgl(.x = str_split(b, ", "), ~ any(str_detect(a, str_escape(.x)))))
a b test
1 a@test.com, b@test.com b@test.com, c@test.com TRUE
2 x@test.com, y@test.com d@test.com, e@test.com FALSE
Or could replace the separator with |
and search for each sub-string in a
:
df %>%
mutate(test = str_detect(a, str_replace(str_escape(b), ", ", "|")))