rprobabilitystochastic-process

Average time in a restaurant in R


I am visiting a restaurant that has a menu with N dishes. Every time that I visit the restaurant I pick one dish at random. I am thinking, what is the average time until I taste all the N dishes in the restaurant?

I think that the number of dishes that I have tasted after n visits in the restaurant is a Markov chain with transition probabilities:

p_{k,k+1} = (N-k)/N 

and

p_{k,k} = k/N 

for k =0,1,2,...,N

I want to simulate this process in R. Doing so (I need help here) given that the restaurant has 100 dishes I did:

nits <- 1000 #simulate the problem 1000 times
count <- 0
N = 100 # number of dishes 
for (i in 1:nits){
 x <- 1:N
 while(length(x) > 0){
   x <- x[x != sample(x=x,size=1)] # pick one dish at random that I have not tasted
   count <- count + 1/nits
 } 
}
count

I want some help because my mathematical result is the the average time is N*log(N) and the code above produces different results.


Solution

  • You have 2 issues.

    1. It's always a red flag when you loop over i, but don't use i inside the loop. Set up a structure to hold the results of every iteration:
    results = integer(length = nits)
    ...
    for (i in 1:nits){
     ...
     while(length(x) > 0){
       ...
     } 
     results[i] <- count
    }
    
    1. Your text says

    pick one dish at random

    Your code says

    pick one dish at random that I have not tasted

    If you always pick a dish you have not tasted, then the problem is trivial - it will take N visits. Let's adjust your code to pick on dish at random whether you have tasted it or not:

    nits <- 1000 #simulate the problem 1000 times
    results = integer(length = nits)
    N = 100 # number of dishes 
    for (i in 1:nits){
      dishes = 1:N
      tasted = rep(0, N) 
      count = 0
      while(sum(tasted) < N){
        tasted[sample(dishes, size = 1)] = 1
        count = count + 1
      } 
      results[i] = count
    }
    results
    

    Looking at the results, I think you may have made a math error:

    100 * log(100)
    # [1] 460.517
    mean(results)
    # [1] 518.302
    

    You can read more about this problem on Wikipedia: Coupon Collector's Problem. Using the result there, the simulation is doing quite well:

    100 * log(100) + .577 * 100 + 0.5
    # [1] 518.717