I've got a MySQL database that has some Murmur2 hashes (as unsigned 64bit ints) that were generated with the Percona UDF that comes with the Percona strand of MySQL database found here https://github.com/percona/build-test/blob/master/plugin/percona-udf/murmur_udf.cc
My problem is that now I need to generate these same hashes on the PHP side, but I can't seem to find or tweak anything existing to work/output the same output for the same input.
Things I've tried:
The segfault gets caused by me running this function
var_dump(murmurhash('Hello World'));
Which works fine normally when I downloaded https://github.com/kibae/php_murmurhash (the original, 32bit, hash producing extension) and followed the instructions, but once I replaced the function (Only edit in the MurmurHash2.cpp file to https://github.com/StirlingMarketingGroup/php_murmurhash/blob/master/MurmurHash2.cpp) the same function call crashes the PHP script.
Here is the PHP function that I've written as a port from the Percona C++ function
function murmurhash2(string $s) : int {
$len = strlen($s);
$seed = 0;
$m = 0x5bd1e995;
$r = 24;
$h1 = $seed ^ $len;
$h2 = 0;
$i = 0;
while ($len >= 8) {
$k1 = ord($s[$i++]);
$k1 *= $m; $k1 ^= $k1 >> $r; $k1 *= $m;
$h1 *= $m; $h1 ^= $k1;
$len -= 4;
$k2 = ord($s[$i++]);
$k2 *= $m; $k2 ^= $k2 >> $r; $k2 *= $m;
$h2 *= $m; $h2 ^= $k2;
$len -= 4;
}
if ($len >= 4) {
$k1 = ord($s[$i++]);
$k1 *= $m; $k1 ^= $k1 >> $r; $k1 *= $m;
$h1 *= $m; $h1 ^= $k1;
$len -= 4;
}
switch ($len) {
case 3: $h2 ^= ord($s[2]) << 16;
case 2: $h2 ^= ord($s[1]) << 8;
case 1: $h2 ^= ord($s[0]);
$h2 *= $m;
};
$h1 ^= $h2 >> 18; $h1 *= $m;
$h2 ^= $h1 >> 22; $h2 *= $m;
$h1 ^= $h2 >> 17; $h1 *= $m;
$h = $h1;
$h = ($h << 32) | $h2;
return $h;
}
Within MySQL I get this
select murmur_hash('Hello World'), cast(murmur_hash('Hello World')as unsigned), CONV(cast(murmur_hash('Hello World')as unsigned), 10, 16);
-- -8846466548632298438 9600277525077253178 853B098B6B655C3A
And in PHP I get
var_dump(murmurhash2('Hello World'));
// int(5969224437940092928)
So looking at the MySQL and PHP results, neither signed nor unsigned match my PHP output.
Is there something that can be fixed with either of my previous two approaches, or maybe an already working approach that I can use instead?
I've solved this myself by essentially porting the Percona hashing function directly to a PHP extension MySQL.
Installation and usage instructions are posted here https://github.com/StirlingMarketingGroup/php-murmur-hash
In MySQL, the Percona extension is used like
select`murmur_hash`('Yeet')
-- -7850704420789372250
And in PHP
php -r 'echo murmur_hash("Yeet");'
// -7850704420789372250
Note that those are getting treated as signed integers for both environments, which you can solve in MySQL by using cast(`murmur_hash`('Yeet')as unsigned)
, but PHP doesn't support unsigned integers.