Average time in a restaurant in R

I am visiting a restaurant that has a menu with N dishes. Every time that I visit the restaurant I pick one dish at random. I am thinking, what is the average time until I taste all the N dishes in the restaurant?

I think that the number of dishes that I have tasted after n visits in the restaurant is a Markov chain with transition probabilities:

p_{k,k+1} = (N-k)/N

and

p_{k,k} = k/N 

for k =0,1,2,...,N

I want to simulate this process in R. Doing so (I need help here) given that the restaurant has 100 dishes I did:

nits <- 1000 #simulate the problem 1000 times
count <- 0
N = 100 # number of dishes 
for (i in 1:nits){
 x <- 1:N
 while(length(x) > 0){
   x <- x[x != sample(x=x,size=1)] # pick one dish at random that I have not tasted
   count <- count + 1/nits
 } 
}
count

I want some help because my mathematical result is the the average time is N*log(N) and the code above produces different results.

Solution

You have 2 issues.

It's always a red flag when you loop over i, but don't use i inside the loop. Set up a structure to hold the results of every iteration:

results = integer(length = nits)
...
for (i in 1:nits){
 ...
 while(length(x) > 0){
   ...
 } 
 results[i] <- count
}

Your text says

pick one dish at random

Your code says

pick one dish at random that I have not tasted

If you always pick a dish you have not tasted, then the problem is trivial - it will take N visits. Let's adjust your code to pick on dish at random whether you have tasted it or not:

nits <- 1000 #simulate the problem 1000 times
results = integer(length = nits)
N = 100 # number of dishes 
for (i in 1:nits){
  dishes = 1:N
  tasted = rep(0, N) 
  count = 0
  while(sum(tasted) < N){
    tasted[sample(dishes, size = 1)] = 1
    count = count + 1
  } 
  results[i] = count
}
results

Looking at the results, I think you may have made a math error:

100 * log(100)
# [1] 460.517
mean(results)
# [1] 518.302

You can read more about this problem on Wikipedia: Coupon Collector's Problem. Using the result there, the simulation is doing quite well:

100 * log(100) + .577 * 100 + 0.5
# [1] 518.717