sqlpostgresqlpostgresql-9.5ltree

In Postgres, when using LTREE, can the ID column be gotten rid of since the `LTREE` path has the ID already as the last label?


Let's say I have a ID TEXT PRIMARY KEY column and this ID is used to create the PATH LTREE column for the row when inserting data. So the ID ends up as the last label in the path.

So, is it a wise idea to get rid of the ID column all-together since the LTREE path has the ID already as the last label?

Can the PATH LTREE be made the primary key?

Is there any harm in this approach?


Solution

  • If you look at the example in the ltree module documentation:

    CREATE TABLE test (path ltree);
    INSERT INTO test VALUES ('Top');
    INSERT INTO test VALUES ('Top.Science');
    INSERT INTO test VALUES ('Top.Science.Astronomy');
    INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
    INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
    INSERT INTO test VALUES ('Top.Hobbies');
    INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
    INSERT INTO test VALUES ('Top.Collections');
    INSERT INTO test VALUES ('Top.Collections.Pictures');
    INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
    INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
    INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
    INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
    CREATE INDEX path_gist_idx ON test USING GIST (path);
    CREATE INDEX path_idx ON test USING BTREE (path);
    

    You can see there are two entries with "Astronomy" as leaf value.

    If your ID is always used as the last label, it may be have duplicates in this case (so no longer has the primary key constraints).

    You might be able to replicate the uniqueness constraint using a unique index on the last label:

    CREATE UNIQUE INDEX ON test (subpath(test.path, -1, 1));
    

    However, you won't be able to create a primary key from this index because it depends on an expression.

    I guess it all depends on how you maintain the consistency between your ID and the ltree value.

    It's probably be worth keeping the ID column as a primary key and compute its value automatically (e.g. via trigger) using subpath(..., -1, 1) whenever your ltree value is inserted and updated, or at least have some form of constraint to make sure it's consistent with the path.

    It also depends on what you would do regarding row locks.

    In particular, a unique index on subpath(test.path, -1, 1) would not fully replicate the behaviour of a Primary Key index:

    Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.