gorandom

Golang pseudo random string (lots of duplicates)


I am attempting to generate a random string of 5 chars. Every implementation I've tried generates duplicates after a very small sample size (10k-40k). While I understand that programmatically its impossible to generate true random strings I was surprised to see duplicates appearing at such a high frequency.

I have tried several implementations (and variations of those) without luck. Each one generates a duplicate after ~40k strings at most. Given that there are 380,204,032 unique combinations in a 5 char string consisting of [A-Z a-z] I was expecting to be able to generate significantly more strings before encountering a duplicate.

After searching around I found a couple of good sources that were the basis for my implementations

The 2nd link in particular caught my eye as the author mentioned using the "crypto/rand" package was better at avoiding duplicates. However it seemed to make little difference in how many strings I was able to generate before encountering a duplicate.

Other variations I tried

Can anyone provide some insight into why these random strings aren't so random and some steps I might be able to take to generate at least 1M strings before encountering a duplicate?

I know I could use a 3rd party library but I would like to avoid that.

func Test_RandomString(t *testing.T) {

    tests := map[string]struct {
        Length       int
        UniuqeValues int
    }{
        "5 x 1,000,000": {
            Length:       5,
            UniuqeValues: 1_000_000,
        },
    }

    for name, test := range tests {
        t.Run(name, func(t *testing.T) {
            actual := utils.RandomString(test.Length)
            assert.Equal(t, test.Length, len(actual))

            values := make(map[string]struct{})
            for count := 0; count < test.UniuqeValues; count++ {
                value := utils.RandomString(test.Length)

                _, found := values[value]
                if found {
                    t.Fatalf("duplicate value found after %v: %v", count, value)
                }

                values[value] = struct{}{}
            }

        })
    }
}
func RandomString_CryptoRand(length int) string {

    const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

    var randomStr []byte = make([]byte, length)

    for i := 0; i < length; i++ {

        idx, _ := rand.Int(rand.Reader, big.NewInt(int64(len(letters))))
        randomStr[i] = letters[idx.Int64()]
    }
    return string(randomStr)
}
func RandomString_MathRand(length int) string {
    var src = rand.NewSource(time.Now().UnixNano())
    const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

    sb := strings.Builder{}
    sb.Grow(length)

    for idx := 0; idx < length; idx++ {
        sb.WriteByte(letterBytes[int64(src.Int63())%int64(len(letterBytes))])
    }

    return sb.String()
}

Solution

  • This is the birthday paradox. You mistakenly assumed that the probability of duplicates is linearly proportional to the pool-size of 525, while in reality the relationship is proportional to the square root. Sqrt(380204032) is 19498 and change, which is pretty much spot-on for what you observed.