I want to hash strings of variable length (6-60 characters long) to 32-bit signed integers in order to save disk space in PostgreSQL.
I don't want to encrypt any data, and the hashing function needs to be reproducible and callable from Python. The problem is that I can only find Algorithms that produce unsigned integers (like CityHash), which therefore produce values of up to 2^32 instead of 2^31.
This is what I have thus far:
import math
from cityhash import CityHash32
string_ = "ALPDAKQKWTGDR"
hashed_string = CityHash32(string_)
print(hashed_string, len(str(hashed_string)))
max_ = int(math.pow(2, 31) - 1)
print(hashed_string > max_)
Ryan answered the question in the comments. Simply subtract 2147483648 (= 2^31) from the hash result.
CityHash32(string_) - math.pow(2, 31)
or
CityHash64(string_) - math.pow(2, 63)
Ryan also mentioned that using SHA-512 and truncating the result to the desired number of digits will lead to less collisions than the method above.