pythonmathdiscrete-mathematicsbirthday-paradox

How many samples needed for a collision (birthday paradox)


Just trying to understand birthday paradox. Using a following code I figured out that I need on average 12 samples to get a birthday collision.
Can not understand why it is much lower than normal 23 people to get a 1/2 chance of a birthday collision. The result does not change even if I use StrongRandom from PyCrypto.

from random import randint
from Crypto.Random.random import StrongRandom
EXPERIMENTS_NUM = 10000
SET_SIZE = 365
SUBSET = 23

where_collision_found = list()
rnd = StrongRandom()
for experiment in range(EXPERIMENTS_NUM):
  for i in range(1,SET_SIZE + 2):
    collision_found = False
    #Generate a subset
    # subset = [rnd.randint(1, SET_SIZE) for x in range(i)]
    subset = [randint(1, SET_SIZE) for x in range(i)]
    # Check for collision
    flags = [False for x in range(SET_SIZE + 1)]
    for k in range(i):
      if flags[subset[k]]: #Collision found
        collision_found = True
      else:
        flags[subset[k]] = True

    if collision_found:
      # print 'Collision found in set:', subset
      break
  where_collision_found.append(i)
print 'average collision:', sum(where_collision_found) / float(len(where_collision_found)), 'in', EXPERIMENTS_NUM, 'experiments'

Result:
average collision: 12.1277 in 10000 experiments


Solution

  • I'm not really clear on what your code is doing. Here's what I did just now:

    from random import randrange
    N = 365
    ns = []
    for _ in range(10000):
        n = 0
        seen = set()
        while True:
            b = randrange(N)
            n += 1
            if b in seen:
                break
            seen.add(b)
        ns.append(n)
    print(sum(ns) / float(len(ns)))
    

    And output from a sample run:

    24.6577
    

    This is good. The "23" you're expecting is the median of the distribution; the mean (average) is expected to be 24.61659 ... See here: https://en.wikipedia.org/wiki/Birthday_problem#Average_number_of_people