pythongohashmurmurhash

Murmur3 Hash Compatibility Between Go and Python


We have two different libraries, one in Python and one in Go that need to compute murmur3 hashes identically. Unfortunately no matter how hard we try, we cannot get the libraries to produce the same result. It appears from this SO question about Java and Python that compatibility isn't necessarily straight forward.

Right now we're using the python mmh3 and Go github.com/spaolacci/murmur3 libraries.

In Go:

hash := murmur3.New128()
hash.Write([]byte("chocolate-covered-espresso-beans"))
fmt.Println(base64.RawURLEncoding.EncodeToString(hash.Sum(nil)))
// Output: cLHSo2nCBxyOezviLM5gwg

In Python:

name = "chocolate-covered-espresso-beans"
hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: jns74izOYMJwsdKjacIHHA (big byteorder)

hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='little', signed=False)
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg (little byteorder)

hash = mmh3.hash_bytes(name.encode('utf-8'))
print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
# Output: HAfCaaPSsXDCYM4s4jt7jg

In Go, murmur3 returns a uint64 so we assume signed=False in Python; however we also tried signed=True and did not get matching hashes.

We're open to different libraries, but are wondering if there is something wrong with either our Go or Python methodologies of computing a base64 encoded hash from a string. Any help appreciated.


Solution

  • That first Python result is almost right.

    >>> binascii.hexlify(base64.b64decode('jns74izOYMJwsdKjacIHHA=='))
    b'8e7b3be22cce60c270b1d2a369c2071c'
    

    In Go:

        x, y := murmur3.Sum128([]byte("chocolate-covered-espresso-beans"))
        fmt.Printf("%x %x\n", x, y)
    

    Results in:

    70b1d2a369c2071c 8e7b3be22cce60c2
    

    So the order of the two words is flipped. To get the same result in Python, you can try something like:

    name = "chocolate-covered-espresso-beans"
    hash = mmh3.hash128(name.encode('utf-8'), signed=False).to_bytes(16, byteorder='big', signed=False)
    hash = hash[8:] + hash[:8]
    print(base64.urlsafe_b64encode(hash).decode('utf-8').strip("="))
    # cLHSo2nCBxyOezviLM5gwg