Say you want to have a set of 1- to 2-digit hexadecimal numbers, so 256 numbers. Just using a small set to get at the problem, but it would work with any sized string.
So you have a potential N
or 256 numbers in this case. You are going to "generate" a new ID for every new data record that comes your way. So it starts of and randomly gives you af
, then 1d
, then 8a
, etc.
The straightforward naïve way to do this is to just simply generate all the numbers in order, then shuffle them, and just pop from the set. This works fine when you only have 256 numbers. But if you have millions or billions of numbers it is impractical as you might have a lot of waisted generated IDs not being used for long periods of time. I would like to avoid this.
So my question is what is the or a fastest way to create a unique key string like this, without generating all of them in advance, and without going in order just incrementing by 1 or whatnot. That is, the key should be seemingly random.
One way I can imagine is using a trie to store the already used/generated values. Then when you are to get a new value, you generate a random one, then check the trie to see if it's already used. I have no idea how to tell how efficient this is though, but it seems like it would be very bad performing once you start running out of IDs and are down to the last few ones in the set. You would generate lots of already generated IDs, and traverse the trie for each, so it would be slow.
I am wondering if there is a more efficient way of doing this, without generating them all in advance. Also, the data records won't be used in figuring out the ID, as the records might be extremely large and complex.
Maybe there is a way to sort of randomly traverse (and generate) a trie at once, and in that way generate the ID since you end up at a unique random place in the trie. Something along those lines perhaps, I don't know.
Also, I am not sophisticated with hashing so I don't know if there would be any good methods with that.
I assume that you could generate sequential IDs; that is, that you have a reliable way of knowing exactly how many IDs have been generated to date. Then it is sufficient to encrypt this count with any reasonably fast encryption algorithm.
The encryption would be done on the count as a binary number, and the encrypted result with most algorithms would be the same size, also binary. If desired, you could base-64 or hex encode the result to make it easier to use as a character string.
Since encryption must be a bijection (that is, a one-to-one mapping) in order for decryption to be possible, this is guaranteed to produce a different result each time until the total ID count overflows. If it is a reasonable encryption function, then the result will appear random (otherwise the cipher would be vulnerable).