I have a table like this:
CREATE TABLE "users"
(
id serial PRIMARY KEY,
town text NOT NULL,
street text not null,
building text
);
I want to be able to store unique entries by 3 columns "town", "street" and "building". If the third row "building" is null, there should be impossible to store any new other rows with same "town" or "street" column, no matter what is in the column "building".
So this should work:
---- example 1:
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 9);
---- example 2:
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', null);
But this should not:
---- example 1:
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 9);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', null);
---- example 2:
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);
INSERT INTO "users" ("town", "street", "building") VALUES ('t', 's', 8);
I tried to use two partial indexes for this:
CREATE UNIQUE INDEX "two_cols"
ON "users" ("town", "street")
WHERE "building" IS NULL;
CREATE UNIQUE INDEX "three_cols"
ON "users" ("town", "street", "building")
WHERE "building" IS NOT NULL;
but the problem that you can only set filter for current index, so first index only checks inside their query, and this allows to store null value with other values, which is not what I need. Removing filter on first index disallow to store two rows with same two but different third column. Is there an option to solve this problem?
You want NULL
to conflict with all values (incl. NULL
). But distinct notnull values shall not conflict with each other. Try as you might, you won't cover that with a UNIQUE
constraint (or index).
This does exactly what you ask for:
-- requires additional module btree_gist
CREATE EXTENSION btree_gist;
-- auxiliary function
CREATE OR REPLACE FUNCTION texthash_int8range(text)
RETURNS int8range
LANGUAGE sql IMMUTABLE PARALLEL SAFE
RETURN int8range(hashtextextended($1, 0), hashtextextended($1, 0), '[]');
CREATE TABLE users (
id serial PRIMARY KEY
, town text NOT NULL
, street text NOT NULL
, building text
, CONSTRAINT address_uni UNIQUE NULLS NOT DISTINCT (town, street, building)
, CONSTRAINT address_with_null_building EXCLUDE USING gist(hash_record_extended((town, street), 0) WITH =, texthash_int8range(building) WITH &&)
);
A Unique constraint is based on a B-tree index using equality checks at its core. An exclusion constraint is based on other index types, as of pg 16, GiST or SP-GiST, and can use additional operators - in particular the "overlaps" operator &&
. But that's not meant for text
values. To get there, hash the text
to int8
and build an int8range
from it, which allows &&
.
I use the hash function hashtextextended()
. You could use hastext()
to build int4range
instead. Smaller and a bit faster, but increased chances for a hash collision. Both are built-in Postgres functions - undocumented, but reliable. See:
The exclusion constraint enforces your special flavor of "uniqueness" completely, the added unique constraint is logically redundant. I kept it anyway. Its underlying B-tree index typically helps performance of many operations on the table. I implemented with NULLS NOT DISTINCT
since that is closer to your case. See:
Might also just be a plain index with default NULLS DISTINCT
. The optimal set of constraints and indexes depends on details of your setup.
int8range
from text
I encapsulated that task into the (optional) auxiliary function texthash_int8range(text)
. First, text
is hashed to a bigint
(int8
) with hashtextextended()
. text NULL
is mapped to int8 NULL
, which happens to be what we need.
An int8range
is built with it, containing only the one value. In range types, NULL
means "unbounded", so the range for NULL
input becomes the unbounded range, overlapping with all. See:
Now everything falls into place for your flavor of uniqueness. The probability for a hash collision is practically zero for moderately large cardinalities, but still possible. If that's an issue, this solution is not for you.
Since queries will be supported by the additional (faster) B-tree index, the purpose of the exclusion constraints is mostly just to enforce your rules. So I optimized for write performance and storage footprint rather than its utility as versatile index and condensed the leading two columns into a single hash with hash_record_extended((town, street), 0)
. That makes more sense the longer typical values are. For very short text
values it might not pay.
We need the equality operator for that, which is not typically useful for GiST indexes, as B-tree indexes are better at that. But for the special case of a multicolumn constraint, we need it anyway. Postgres provides the required operator class(es) with the additional module btree_gist
. Install that first. See:
To optimize, I use the minimal form of an SQL-standard function. (Can be a plain SQL function, too.) See:
But it must be IMMUTABLE
. And should be PARALLEL SAFE
.