Suppose I have these variables in R:
vals <- c("b", "c")
foo <- data.frame(x=c("a|b", "b|c", "c|d", "e|f|g"))
I'd like another column in foo
that has the number of items from vals
, e.g.
> foo2
x y
1 a|b 1
2 b|c 2
3 c|d 1
4 e|f|g 0
1 because "a|b" has "b", 2 because "b|c" has "b" and "c", etc.
How do I do that with tidyverse functions?
I can split x, but the intersection isn't working. A couple of failed attempts:
library(dplyr)
library(magrittr)
> foo2 <- foo %>% mutate(x1=str_split(x, "\\|"), y=intersect(vals, x1))
Error in `mutate()`:
ℹ In argument: `y = intersect(vals, x1)`.
Caused by error:
! `y` must be size 4 or 1, not 0.
> foo2 <- foo %>% mutate(x1=str_split(x, "\\|"), y=intersect(vals, x1[[1]]))
> foo2
x x1 y
1 a|b a, b b
2 b|c b, c b
3 c|d c, d b
4 e|f|g e, f, g b
You need to map
(or lapply
) your intersect
to apply it separately to each row:
library(purrr)
foo |>
mutate(
xsplit = strsplit(x, split = "|", fixed = TRUE),
intersect = map(xsplit, intersect, vals),
y = lengths(intersect)
)
# x xsplit intersect y
# 1 a|b a, b b 1
# 2 b|c b, c b, c 2
# 3 c|d c, d c 1
# 4 e|f|g e, f, g 0