javaparadoxbirthday-paradox

Birthday paradox


I want to simulate the birthday paradox in java. For some reason my output(probability) keeps getting very close to 1 e.g. simulation(10)->0,9268. In start you can see the probabilities my simulations should be close to. I have been looking for a mistake in my code for quite a while now so i hope any of you might be able to help me. I've looked up other codes of the birthday paradox but none of them seem able to help me with my strange output. p.s. you can ignore the //TODO, ill fix that once i get the code up and running. Thanks is advance!

static final int DAYS_IN_YEAR = 365;
static final int NUMBER_OF_SIMULATIONS = 500; 

LCG random;
HashSet<Integer> birthdaySet = new HashSet<Integer>();

BirthdayProblem2(){
    birthdaySet.clear();
    random = new LCG(); //random numbers between 0 and 1
}

int generateBirthday(){ //generates random birthday
    return (int) Math.round(Math.random()*DAYS_IN_YEAR); //TODO use LGC
}

double iteration(int numberOfStudents){ //one iteration from the simulation
    int sameBirthdays = 0;

    for (int i = 0; i < numberOfStudents; i++){
        int bd = generateBirthday();

        if (birthdaySet.contains(bd)){
            sameBirthdays++;
        } else {
            birthdaySet.add(bd);
        }           
    }
    return (double) sameBirthdays/numberOfStudents; //probability of two students having the same birthday
}

void simulation(int numberOfStudents){
    double probabilitySum = 0;
    for (int i = 0; i < NUMBER_OF_SIMULATIONS; i++){
        probabilitySum += iteration(numberOfStudents);
    }
    System.out.printf("For n=%d -> P=%.4f \n", numberOfStudents, probabilitySum/NUMBER_OF_SIMULATIONS);
}

private void start() {

    simulation(10); //should be about 0.1
    simulation(20); //should be about 0.4
    simulation(23); //should be about 0.5
    simulation(35); //should be about 0.8
}

public static void main(String[] argv) {
    new BirthdayProblem2().start();
}

}


Solution

  • You are not clearing the birthdaySet between iterations. Change it from a field to a local variable will prevent errors like that, because why would you need it as a field in the first place?

    On a different note, generateBirthday should use Random.nextInt(DAYS_IN_YEAR). An instance of Random is a prime candidate as field. What is that LCG random; line anyway?

    UPDATE

    Method iteration() should return boolean, for whether or not there are students with same birthday. Number of true values returned divided by number of calls equals probability.

    Since there is a relative small number of days in a year, a boolean array will be faster than a Set, so here is updated code:

    private static final int DAYS_IN_YEAR = 365;
    private static final int NUMBER_OF_SIMULATIONS = 500;
    private Random rnd = new Random();
    private int generateBirthday() {
        return this.rnd.nextInt(DAYS_IN_YEAR);
    }
    private boolean iteration(int numberOfStudents) { //one iteration from the simulation
        boolean[] present = new boolean[DAYS_IN_YEAR];
        for (int i = 0; i < numberOfStudents; i++) {
            int birthday = generateBirthday();
            if (present[birthday])
                return true;
            present[birthday] = true;
        }
        return false;
    }
    void simulation(int numberOfStudents){
        int count = 0;
        for (int i = 0; i < NUMBER_OF_SIMULATIONS; i++)
            if (iteration(numberOfStudents))
                count++;
        System.out.printf("For n=%d -> P=%.4f%n", numberOfStudents, (double)count/NUMBER_OF_SIMULATIONS);
    }
    

    Sample Output

    For n=10 -> P=0.1340
    For n=20 -> P=0.4120
    For n=23 -> P=0.5080
    For n=35 -> P=0.8160
    
    For n=10 -> P=0.1200
    For n=20 -> P=0.4120
    For n=23 -> P=0.5260
    For n=35 -> P=0.8140
    
    For n=10 -> P=0.1260
    For n=20 -> P=0.3920
    For n=23 -> P=0.5080
    For n=35 -> P=0.7920
    

    Increasing NUMBER_OF_SIMULATIONS to 5_000_000:

    For n=10 -> P=0.1167
    For n=20 -> P=0.4113
    For n=23 -> P=0.5075
    For n=35 -> P=0.8145
    
    For n=10 -> P=0.1168
    For n=20 -> P=0.4113
    For n=23 -> P=0.5072
    For n=35 -> P=0.8142