I want to partition my users into several groups to run an A/B test.
The usual approach is to randomly assign each user to a variant and store the relation until the end of the A/B test. But that will force me to store that association somewhere and I want to avoid it.
Since the user are already registered in my application I would like to have a function that uniformly distributes the users across my tests so I can get non skewed results in my A/B test.
Which kind of hash function should I use?
This ACM's paper explains that md5 is a good hashing function to get a both an uniform distribution and no correlations between experiments:
We found that only the cryptographic hash function MD5 generated no correlations between experiments. SHA256 (another cryptographic hash) came close, requiring a five-way interaction to produce a correlation. The .NET string hashing function failed to pass even a two-way interaction test.