I am generating random order tracking numbers that should be short, yet should not be duplicate.
For a tracking number consisting of 3 digits, we will generate a duplicate random number after 40 attempts on average.
If we bump it to 12 digits, it will take on average 1.3 million attempts to generate a duplicate random number.
What is a more general formula for calculating how many attempts on average it will take to generate a duplicate random number up to a predefined limit?
Empirically, I can figure it out using this code, but I am looking for a more general solution:
/**
* Calculates the average iteration at which we will generate
* a random integer that has been generated before.
* This numbers is much smaller than expected due to birthday paradox.
*/
// Generate random numbers up to (exclusive) this number:
const RANGE = 1000000000000;
let foundCollisions = 0;
const foundAt = [];
while(true) {
let i = 0;
const exists = {}
while(true) {
i++;
const random = Math.floor(Math.random() * RANGE);
if (exists[random]) {
// We found a duplicate number:
break;
} else {
exists[random] = true;
}
}
foundCollisions++;
foundAt.push(i);
// Calculate the average iteration at which we found a duplicate number:
const averageFoundAt = foundAt.reduce((total, num) => total + num) / foundCollisions;
console.log(`Found ${foundCollisions} collisions, average at ${Math.round(averageFoundAt)}th iteration.`);
}
Average value of tryouts before duplicate is described at Wiki birthday paradox page and
n(average) = 1 + Q(M)
where M is your range and
Q(M) = Sum[k=1..M](M!/(M-k)!M^k)
Ramanujan approximation gives 40.3 for M=1000 and 1253314 for 10^12