I have data as follows:
userID <- c(1,1,1,2,2,2,3,3,3)
product <- c("a","a","a","b","b","c","a","b","c")
df <- data.frame(userID, product)
For each 'userID', I want to create a binary indicator variable which is 1 if there are more than one unique product, and 0 if all products are the same.
so my filled vector would look like:
df$result <- c(0,0,0,1,1,1,1,1,1)
# userID product result
# 1 1 a 0
# 2 1 a 0
# 3 1 a 0
# 4 2 b 1
# 5 2 b 1
# 6 2 c 1
# 7 3 a 1
# 8 3 b 1
# 9 3 c 1
E.g. user 1 has only one distinct product ('a') -> result = 0. User 2 has more than one unique product ('b' and 'c') -> result = 1.
You could use ave
from base R
df$result <- with(df, ave(as.character(product), userID,
FUN=function(x) length(unique(x)))>1) +0
df$result
[1] 0 0 0 1 1 1 1 1 1
Or as suggested by @David Arenburg, you could use transform
and create a new variable result
within the df
transform(df, result = (ave(as.character(product),
userID, FUN = function(x) length(unique(x)))>1)+0)
Or
tbl <- rowSums(!!table(df[,-3]))>1
(df$userID %in% names(tbl)[tbl])+0
#[1] 0 0 0 1 1 1 1 1 1