Currently I'm trying to develop a PostgreSQL database schema, it has only 2 tables, one of which contains usernames. The difficulty is that for certain reasons I cannot store the username directly, so I have to store the hash (SHA256).
Postgres has a bytea data type, which is an array of bytes, which is technically SHA256.
My question is whether there is a better type to store the hash in terms of speed of searching whether the username exists in the database or not.
Perhaps I should look towards NoSQL solutions, where such a search is faster?
-----Add-----
The answer and comments suggest that it is optimal to use bytea
.
CREATE TABLE users (
username bytea PRIMARY KEY,
somedata text NOT NULL
);
Try it. As you suspected and as already confirmed in comments and the answer from @Laurenz Albe, bytea
wins by being the most compact and fastest to look up.
Here's how storing 400k hashes compares between these:
variant | pg_total_relation_size | pg_size_pretty |
---|---|---|
table1_bytea | 48709632 | 46 MB |
table2_text | 75530240 | 72 MB |
table3_text_collate_c | 75522048 | 72 MB |
And here's how much time it takes to look one up:
variant | avg |
---|---|
hashes_in_bytea | 00:00:00.00003 |
hashes_in_text_collate_c | 00:00:00.000035 |
hashes_in_text | 00:00:00.000042 |
hashes_in_text_otf | 00:00:00.00011 |
hash index because I'm only interested in equality checks
You can't be using hash
for unique indexes, so you'd need to maintain both that and the separate unique index that gets created to handle the UNIQUE
constraint on your username
column. I'd just stick with the default unique you already have.