In a contest, each winner and prize is assigned a random integer [1, 9] called a "ticket" number and a unique "ID" number [1111, 9999]. Each winner receives a unique prize from a limited stock of prizes based on the winner's ticket number ±1.
How can I prevent the script (below) from returning duplicate prizes? I've used the duplicate() function before, but I'm unsure how to implement it in this case.
How would I implement this rule in my script: If a non-duplicated prize cannot be found, then return a prize from the unclaimed stock that is the next closest match.
Here's what I have thus far:
# Function to generate a data frame with random parameters
generate <- function(n) {
ID <- as.factor(sample(1111:9999, n))
ticket <- sample(1:9, n, replace = TRUE)
lower.bound <- ticket - 1
upper.bound <- ticket + 1
winners.df <- cbind.data.frame(ID, ticket, lower.bound, upper.bound)
return(winners.df)
}
# Generate a master data frame
master <- generate(20)
# Split master data frame into "prizes" and "winners"
prizes <- master[1:16, ]
winners <- master[17:20, ]
# Eliminate upper/lower bound columns in prizes as they are not needed
prizes <- prizes[, -c(3, 4)]
# Set an empty variable to serve as a container
picks <- list(NULL)
for (x in 1:length(winners$ID)) {
pool <- subset(prizes, ticket >= winners$lower.bound[x] & ticket <= winners$upper.bound[x])
picks[[x]] <- pool[sample(nrow(pool), 1), ]
}
picks <- do.call(rbind.data.frame, picks)
# Generate a summary of winners and their prizes
winners.prizes <- data.frame(winnerID = winners$ID,
winnerTicket = winners$ticket,
prizeID = picks$ID,
prizeTicket = picks$ticket)
Original Answer
For question 1.
You need to remove the prize chosen from the prizes data.frame in order for them not to be picked again.
# Assign a unique prize to each winner
for (x in 1:length(winners$ID)) {
pool <- subset(prizes, ticket >= winners$lower.bound[x] & ticket <= winners$upper.bound[x])
# Assign a prize to prize var and remove it from prizes
prize = pool[sample(nrow(pool), 1), ]
prizes = prizes[!(prizes$ID %in% prize$ID),]
picks[[x]] <- prize
}
New Answer
I've put a little more thought into this as I looked more into your code.
I would avoid using subset
as it can have unintended consequences. Also it's not necessary to save your picks into a list if you're just going to transform it into a data.frame. You're better off starting with a data.frame and then updating it. Lastly, I think it may be better to include a new column that highlights whether or not the prize was chosen versus removing the chosen prize from your initial set.
One final note - I would recommend not using periods in variable names. They can be misinterpreted as S3 methods.
I set up a function to generate the winners table and a prizes table to show which were/weren't chosen. Too many variables were being created in the global env. So it makes more sense to keep this contained.
set.seed(100) # for reproducibility
# Generate a data frame with random parameters
generate <- function(n) {
ticket <- sample(1:9, n, replace = TRUE)
ID <- as.factor(sample(1111:9999, n))
lower.bound <- ticket - 1
upper.bound <- ticket + 1
winners.df <- cbind.data.frame(ID, ticket, lower.bound, upper.bound)
return(winners.df)
}
# Generate a master data frame
master <- generate(20)
# Split master data frame into "prizes" and "winners"
hold <- sample(c(TRUE, FALSE), nrow(master), replace = TRUE, prob = c(0.75, 0.25))
prizes <- master[hold, ]
winners <- master[!hold, ]
# Eliminate upper/lower bound columns in prizes as they are not needed
prizes <- prizes[, -c(3, 4)]
winners_output = function(w, p) {
winners_dt = data.frame()
p$chosen = FALSE
# Assign a unique prize to each winner
for (i in 1:length(w$ID)) {
upper_b = w$upper.bound[i]
lower_b = w$lower.bound[i]
# Subset to only prizes not chosen and create sample pool
avail_prizes = p[!(p$chosen),]
pool <- avail_prizes[avail_prizes$ticket >= lower_b &avail_prizes$ticket <= upper_b,]
# Assign a prize and remove it from prizes
assigned_prize = pool[sample(nrow(pool), 1), ]
# Update chosen prize to TRUE
p$chosen[which(p$ID == assigned_prize$ID)] = TRUE
# Set up data frame for each winner
w_dt = data.frame(
winnerID = w$ID[i],
winnerTicket = w$ticket[i],
prizeID = assigned_prize$ID,
prizeTicket = assigned_prize$ticket
)
# add to full winners dt
winners_dt = rbind(winners_dt, w_dt)
}
# return all winners plus chosen prizes
return(list(
winners_dt = winners_dt,
prizes = p))
}
w = winners_output(w = winners, p = prizes)
# > w$winners_dt
# winnerID winnerTicket prizeID prizeTicket
# 1 7578 6 2927 7
# 2 6397 7 2741 7
# 3 8655 6 3100 6
# 4 2918 6 6388 7
# 5 9907 2 1333 2
# 6 4852 7 7882 7
# 7 8174 8 7095 8
# > w$prizes
# ID ticket chosen
# 1 2927 7 TRUE
# 3 9590 3 FALSE
# 4 9649 9 FALSE
# 8 6177 4 FALSE
# 9 7882 7 TRUE
# 10 3100 6 TRUE
# 13 6388 7 TRUE
# 14 2741 7 TRUE
# 15 7095 8 TRUE
# 16 1333 2 TRUE
# 17 9203 3 FALSE
# 18 7505 3 FALSE
# 20 6204 2 FALSE