uuidbirthday-paradox

Can i safely take the head of an uuidv4 to arrive at the collision/space-consumption tradeoff I want?


I'm wondering if I can safely calculate the chances of collision using the birthday-paradox, by taking a variable head (i.e.: the x first characters) of an uuidv4.

usecase: I want random id's with small chances for collision. However, the number of records per table will never exceed 1.000.000 so using a uuidv4 will be excessive and I want to shave some bits off. This will save a lot, since I'll end up having thousands of tables.

Naively, one could take as suggested, a variable head of a uuidv4 to get a fixed number of random bits. Then, using the birthday-paradox, you could calculate the collision-probability.

For instance, 1.000.000 ids encoded with 72 bits random data, would give a small enough chance of collision of 1.05* 10^-10 (see calc) This could be encoded in 12 chars (base64), which would give nice enough URLs.

Question of course is if an arbitrary head of an uuidv4 could be considered random (enough).

Btw: question based on this HN post


Solution

  • A random UUID has no special properties that are impossible to produce with an "ordinary" random-number generator. It's just 6 fixed bits to identify it as a UUIDv4, and then 122 random bits glued together. If you decide that you want some other number of random bits, then generate however many you want; there's not much point in creating a UUID instead and then throwing a big chunk of it away.