I have a df
,where the id
refers to a different person and the fruits_eat
refers to the fruit that person eats. Also, I have a vector fruits_list
storing a list of fruits.
I want to generate a new variable fruits_in_list
to indicate whether a person ate one and more fruits in the fruits_list
, but I don't know how to implement it in R.
I checked some answers, but none of them are very relevant to my problem, like.
fruits_Jack = c('XXappleYYY,lemon,orange,pitaya')
fruits_Rose = c('Navel orange,Blood orange,watermelon,cherry')
fruits_Biden= c('pitaya,cherry,banana')
fruits_list = c('apple', 'lemon', 'orange', 'watermelon', 'peach', 'pear')
df =
data.frame(id = c('Jack', 'Rose', 'Biden'),
fruits_eat = c(fruits_Jack, fruits_Rose, fruits_Biden))
> df
id fruits_eat
1 Jack apple,lemon,orange,pitaya
2 Rose Navel orange,Blood orange,watermelon,cherry
3 Biden pitaya,cherry,banana
df_expect = cbind(df, fruits_in_list = c(1, 1, 0))
> df_expect
id fruits_eat fruits_in_list
1 Jack apple,lemon,orange,pitaya 1
2 Rose Navel orange,Blood orange,watermelon,cherry 1
3 Biden pitaya,cherry,banana 0
With stringr
, use str_detect
, or str_count
if you want a real count:
library(stringr)
library(dplyr)
df %>%
mutate(fruits_in_list = +(str_detect(fruits_eat, paste0(fruits_list, collapse = "|"))),
count = str_count(fruits_eat, paste0(fruits_list, collapse = "|")))
id fruits_eat fruits_in_list count
1 Jack XXappleYYY,lemon,orange,pitaya 1 3
2 Rose Navel orange,Blood orange,watermelon,cherry 1 3
3 Biden pitaya,cherry,banana 0 0