rduplicatesrsample

Match Each Winner with a Unique Prize


In a contest, each winner and prize is assigned a random integer [1, 9] called a "ticket" number and a unique "ID" number [1111, 9999]. Each winner receives a unique prize from a limited stock of prizes based on the winner's ticket number ±1.

Question 1: Duplicate Prizes

How can I prevent the script (below) from returning duplicate prizes? I've used the duplicate() function before, but I'm unsure how to implement it in this case.

Question 2: Cannot Match a Winner with a Prize

How would I implement this rule in my script: If a non-duplicated prize cannot be found, then return a prize from the unclaimed stock that is the next closest match.

Here's what I have thus far:

# Function to generate a data frame with random parameters
generate <- function(n) {
  ID <- as.factor(sample(1111:9999, n))
  ticket <- sample(1:9, n, replace = TRUE)
  lower.bound <- ticket - 1
  upper.bound <- ticket + 1
  winners.df <- cbind.data.frame(ID, ticket, lower.bound, upper.bound)
  return(winners.df)
}

# Generate a master data frame
master <- generate(20)

# Split master data frame into "prizes" and "winners"
prizes  <- master[1:16, ]
winners   <- master[17:20, ]

# Eliminate upper/lower bound columns in prizes as they are not needed
prizes <- prizes[, -c(3, 4)]

# Set an empty variable to serve as a container
picks <- list(NULL)

for (x in 1:length(winners$ID)) {
  pool <- subset(prizes, ticket >= winners$lower.bound[x] & ticket <= winners$upper.bound[x])
  picks[[x]] <- pool[sample(nrow(pool), 1), ]
}

picks <- do.call(rbind.data.frame, picks)

# Generate a summary of winners and their prizes
winners.prizes <- data.frame(winnerID = winners$ID,
                             winnerTicket = winners$ticket,
                             prizeID = picks$ID,
                             prizeTicket = picks$ticket)

Solution

  • Original Answer

    For question 1.

    You need to remove the prize chosen from the prizes data.frame in order for them not to be picked again.

    # Assign a unique prize to each winner
    for (x in 1:length(winners$ID)) {
      pool <- subset(prizes, ticket >= winners$lower.bound[x] & ticket <= winners$upper.bound[x])
      
      # Assign a prize to prize var and remove it from prizes
      prize = pool[sample(nrow(pool), 1), ]
      prizes = prizes[!(prizes$ID %in% prize$ID),]
      
      picks[[x]] <- prize
    }
    

    New Answer

    I've put a little more thought into this as I looked more into your code.

    I would avoid using subset as it can have unintended consequences. Also it's not necessary to save your picks into a list if you're just going to transform it into a data.frame. You're better off starting with a data.frame and then updating it. Lastly, I think it may be better to include a new column that highlights whether or not the prize was chosen versus removing the chosen prize from your initial set.

    One final note - I would recommend not using periods in variable names. They can be misinterpreted as S3 methods.

    I set up a function to generate the winners table and a prizes table to show which were/weren't chosen. Too many variables were being created in the global env. So it makes more sense to keep this contained.

    set.seed(100) # for reproducibility
    
    # Generate a data frame with random parameters
    generate <- function(n) {
      ticket <- sample(1:9, n, replace = TRUE)
      ID <- as.factor(sample(1111:9999, n))
      lower.bound <- ticket - 1
      upper.bound <- ticket + 1
      winners.df <- cbind.data.frame(ID, ticket, lower.bound, upper.bound)
      return(winners.df)
    }
    
    # Generate a master data frame
    master <- generate(20)
    
    # Split master data frame into "prizes" and "winners"
    hold <- sample(c(TRUE, FALSE), nrow(master), replace = TRUE, prob = c(0.75, 0.25))
    prizes <- master[hold, ]
    winners <- master[!hold, ]
    
    # Eliminate upper/lower bound columns in prizes as they are not needed
    prizes <- prizes[, -c(3, 4)]
    
    winners_output = function(w, p) {
      winners_dt = data.frame()
      p$chosen = FALSE
      
      # Assign a unique prize to each winner
      for (i in 1:length(w$ID)) {
        upper_b = w$upper.bound[i] 
        lower_b = w$lower.bound[i]
        
        # Subset to only prizes not chosen and create sample pool
        avail_prizes = p[!(p$chosen),]
        pool <- avail_prizes[avail_prizes$ticket >= lower_b &avail_prizes$ticket <= upper_b,]
        
        # Assign a prize and remove it from prizes
        assigned_prize = pool[sample(nrow(pool), 1), ]
        
        # Update chosen prize to TRUE
        p$chosen[which(p$ID == assigned_prize$ID)] = TRUE
        
        # Set up data frame for each winner
        w_dt = data.frame(
          winnerID = w$ID[i],
          winnerTicket = w$ticket[i],
          prizeID = assigned_prize$ID,
          prizeTicket = assigned_prize$ticket
        )
        
        # add to full winners dt
        winners_dt = rbind(winners_dt, w_dt)
      }
      
      # return all winners plus chosen prizes
      return(list(
        winners_dt = winners_dt,
        prizes = p))
    }
    
    w = winners_output(w = winners, p = prizes)
    
    # > w$winners_dt
    #   winnerID winnerTicket prizeID prizeTicket
    # 1     7578            6    2927           7
    # 2     6397            7    2741           7
    # 3     8655            6    3100           6
    # 4     2918            6    6388           7
    # 5     9907            2    1333           2
    # 6     4852            7    7882           7
    # 7     8174            8    7095           8
    # > w$prizes
    #      ID ticket chosen
    # 1  2927      7   TRUE
    # 3  9590      3  FALSE
    # 4  9649      9  FALSE
    # 8  6177      4  FALSE
    # 9  7882      7   TRUE
    # 10 3100      6   TRUE
    # 13 6388      7   TRUE
    # 14 2741      7   TRUE
    # 15 7095      8   TRUE
    # 16 1333      2   TRUE
    # 17 9203      3  FALSE
    # 18 7505      3  FALSE
    # 20 6204      2  FALSE