[SOLVED] Perfect hash function for integer sequence

Perfect hash function for integer sequence

Given a set of integers (sequence) 1…999_999 (for example) I need to map each individual integer to another integer in the same set 1:1 randomly (distribution depends on seed). Hash function must be scalable to large sets, so shuffling and storing all values in the memory is not an option. Is there any good way of doing this?

Some examples:


// 1..3 seq
lowerBound = 1;
upperBound = 3;

seed = 1

h1 = makeHashFn(lowerBound, upperBound, seed)

h1(1) // 2
h1(2) // 3
h1(3) // 1

newSeed = 2

h2 = makeHashFn(lowerBound, upperBound, newSeed)

h2(1) // 3
h2(2) // 1
h2(2) // 2

Solution

It's not possible to do this without any kind of memory usage.

If you're happy for number collisions to happen, it is possible, but otherwise, you can't really have it be random and stateless.

What you can do though, is shuffle a list of all indices randomly. That would be only 4 or 8 bytes per list element, which is fairly reasonable for most applications.

If you use a deterministic seeded RNG to shuffle the indices, the result will be the same every time, and in that case, you would not need to store the shuffled indices, rather you could regenerate them and discard them as needed for your memory requirements.

There aren't any silver bullets, every solution to this problem will have significant tradeoffs. If you have a supermassive database with billions of entries, it's probably better to step back and redefine the problem in a more efficient way.