rdataframedata-manipulationsplit-apply-combine

Create binary variable based on number of unique / distinct values by group


I have data as follows:

userID  <- c(1,1,1,2,2,2,3,3,3)
product <- c("a","a","a","b","b","c","a","b","c")
df <- data.frame(userID, product)

For each 'userID', I want to create a binary indicator variable which is 1 if there are more than one unique product, and 0 if all products are the same.

so my filled vector would look like:

df$result <- c(0,0,0,1,1,1,1,1,1)
#    userID product result
# 1      1       a      0
# 2      1       a      0
# 3      1       a      0
# 4      2       b      1
# 5      2       b      1
# 6      2       c      1
# 7      3       a      1
# 8      3       b      1
# 9      3       c      1

E.g. user 1 has only one distinct product ('a') -> result = 0. User 2 has more than one unique product ('b' and 'c') -> result = 1.


Solution

  • You could use ave from base R

     df$result <- with(df, ave(as.character(product), userID, 
                     FUN=function(x) length(unique(x)))>1) +0 
     df$result
     [1] 0 0 0 1 1 1 1 1 1
    

    Or as suggested by @David Arenburg, you could use transform and create a new variable result within the df

      transform(df, result = (ave(as.character(product), 
              userID, FUN = function(x) length(unique(x)))>1)+0)
    

    Or

    tbl <- rowSums(!!table(df[,-3]))>1
    (df$userID %in% names(tbl)[tbl])+0
     #[1] 0 0 0 1 1 1 1 1 1