javamurmurhash

Zero-Allocation-Hashing murmur3: hashChars() and hashBytes() produce different output


I am not sure if I am using murmur3 (OpenHFT's zero-allocation-hashing) function correctly but the result seems different for hashChars() and hashBytes()

// Using zero-allocation-hashing 0.16  
String input = "abc123";
System.out.println(LongHashFunction.murmur_3().hashChars(input));
System.out.println(LongHashFunction.murmur_3().hashBytes(input.getBytes(StandardCharsets.UTF_8)));

Output:

-4878457159164508227
-7432123028918728600

The latter one produces the same output as Guava lib.

Which function should be used for String inputs?

Shouldn't both functions produce the same result?

Update:

How can I get same output as :

Hashing.murmur3_128().newHasher().putString(input, Charsets.UTF_8).hash().asLong();
Hashing.murmur3_128().newHasher().putString(input, Charsets.UTF_8).hash().toString()

using zero-allocation-hashing lib which seems to be faster than Guava


Solution

  • Your assumption regarding UTF-8 is not correct, it holds for StandardCharsets.UTF_16LE.

    String input = "abc123";
    
    System.out.println(LongHashFunction.murmur_3().hashChars(input));
    System.out.println(LongHashFunction.murmur_3().hashBytes(
      input.getBytes(StandardCharsets.UTF_16LE)
    ));
    

    gives:

    -4878457159164508227
    -4878457159164508227
    

    Additional Answer

    For the desired:

    Hashing.murmur3_128().newHasher().putString(input, Charsets.UTF_8).hash().asLong();
    

    this:

    LongHashFunction.murmur_3().hashBytes(input.getBytes(StandardCharsets.UTF_8));
    

    seems to work (please test more!)

    The (hex) string conversion is sort of a problem, since the guava hash creates (really) 128 bits (16 bytes, 2 longs), whereas "your lib" gives us only 64 bits!

    Half of the digits i can reproduce with: ...

    thx to:


    With your help (sorry first time encounter this lib), I could finally:

    System.out.println("Actual:   " +
        toHexString(
            LongTupleHashFunction.murmur_3().hashBytes(
                input.getBytes(StandardCharsets.UTF_8)
            )
        )
    );
    

    where:

    private static final String toHexString(long[] hashLongs) {
        StringBuilder sb = new StringBuilder(hashLongs.length * Long.BYTES * 2);
        for (long lng : hashLongs)
            for (int i = 0; i < Long.BYTES; i++) {
                byte b = (byte) (lng >> (i * Long.BYTES));
                sb.append(HEX_DIGITS[(b >> 4) & 0xf]).append(HEX_DIGITS[b & 0xf]);
            }
        return sb.toString();
    }
    
    private static final char[] HEX_DIGITS = "0123456789abcdef".toCharArray();