I am attempting to generate a random string of 5 chars. Every implementation I've tried generates duplicates after a very small sample size (10k-40k). While I understand that programmatically its impossible to generate true random strings I was surprised to see duplicates appearing at such a high frequency.
I have tried several implementations (and variations of those) without luck. Each one generates a duplicate after ~40k strings at most. Given that there are 380,204,032 unique combinations in a 5 char string consisting of [A-Z a-z] I was expecting to be able to generate significantly more strings before encountering a duplicate.
After searching around I found a couple of good sources that were the basis for my implementations
The 2nd link in particular caught my eye as the author mentioned using the "crypto/rand"
package was better at avoiding duplicates. However it seemed to make little difference in how many strings I was able to generate before encountering a duplicate.
Other variations I tried
rand.NewSource
after each char (instead of each string)rand.NewSource
and using the output of the 1st to seed the 2nd, 3rd, etc.Can anyone provide some insight into why these random strings aren't so random and some steps I might be able to take to generate at least 1M strings before encountering a duplicate?
I know I could use a 3rd party library but I would like to avoid that.
func Test_RandomString(t *testing.T) {
tests := map[string]struct {
Length int
UniuqeValues int
}{
"5 x 1,000,000": {
Length: 5,
UniuqeValues: 1_000_000,
},
}
for name, test := range tests {
t.Run(name, func(t *testing.T) {
actual := utils.RandomString(test.Length)
assert.Equal(t, test.Length, len(actual))
values := make(map[string]struct{})
for count := 0; count < test.UniuqeValues; count++ {
value := utils.RandomString(test.Length)
_, found := values[value]
if found {
t.Fatalf("duplicate value found after %v: %v", count, value)
}
values[value] = struct{}{}
}
})
}
}
func RandomString_CryptoRand(length int) string {
const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
var randomStr []byte = make([]byte, length)
for i := 0; i < length; i++ {
idx, _ := rand.Int(rand.Reader, big.NewInt(int64(len(letters))))
randomStr[i] = letters[idx.Int64()]
}
return string(randomStr)
}
func RandomString_MathRand(length int) string {
var src = rand.NewSource(time.Now().UnixNano())
const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
sb := strings.Builder{}
sb.Grow(length)
for idx := 0; idx < length; idx++ {
sb.WriteByte(letterBytes[int64(src.Int63())%int64(len(letterBytes))])
}
return sb.String()
}
This is the birthday paradox. You mistakenly assumed that the probability of duplicates is linearly proportional to the pool-size of 525, while in reality the relationship is proportional to the square root. Sqrt(380204032) is 19498 and change, which is pretty much spot-on for what you observed.