I am trying to retrieve a hierarchical ordered result from a query on an auto referenced table like this:
create table category (
id serial,
-- parent category,
parent_id integer default null, -- null for root category
-- tree control
tree_depth smallint not null, -- calculated
primary key (id),
unique (parent_id, id),
foreign key (parent_id) references category (id)
);
This is a common approach to store a tree of categories except for the need for supporting multiple languages. To that aim, we join a language-dependent table like this:
create table category_lang (
id serial,
-- natural primary key
category_id integer not null,
lang_code char(2) not null,
-- language-dependent data
title varchar(128) not null,
primary key (id),
unique (category_id, lang_code)
);
The tree_depth
column is calculated in a before insert
trigger like this:
create or replace function fn_category__bins () returns trigger as $$
begin
-- calculate tree_depth as parent tree_depth + 1
if new.parent_id is null then
new.tree_depth = 0;
else
new.tree_depth = (select tree_depth from category where id = new.parent_id limit 1) + 1;
end if;
return new;
end;
$$ language plpgsql;
create trigger tg_category__bins before insert on category for each row
execute procedure fn_category__bins();
We populate the tables with easy to read texts in two languages:
insert into category (parent_id, id) values
(null, 1),
(null, 2),
(null, 3),
(1, 11),
(1, 12),
(1, 13),
(2, 21),
(2, 22),
(3, 31),
(21, 211),
(21, 212),
(21, 213);
-- lang_code = 'EN'
insert into category_lang (category_id, title, lang_code) values
(1, 'One', 'EN'),
(2, 'Two', 'EN'),
(3, 'Three', 'EN'),
(11, 'One.One', 'EN'),
(12, 'One.Two', 'EN'),
(13, 'One.Three', 'EN'),
(21, 'Two.One', 'EN'),
(22, 'Two.Two', 'EN'),
(31, 'Three.One', 'EN'),
(211, 'Two.One.One', 'EN'),
(212, 'Two.One.Two', 'EN'),
(213, 'Two.One.Three', 'EN');
-- lang_code = 'ES'
insert into category_lang (category_id, title, lang_code) values
(1, 'Uno', 'ES'),
(2, 'Dos', 'ES'),
(3, 'Tres', 'ES'),
(11, 'Uno.Uno', 'ES'),
(12, 'Uno.Dos', 'ES'),
(13, 'Uno.Tres', 'ES'),
(21, 'Dos.Uno', 'ES'),
(22, 'Dos.Dos', 'ES'),
(31, 'Tres.Uno', 'ES'),
(211, 'Dos.Uno.Uno', 'ES'),
(212, 'Dos.Uno.Dos', 'ES'),
(213, 'Dos.Uno.Tres', 'ES');
A plain query produces a natural result like this:
select * from category tc
left outer join category_lang tl on tl.category_id = tc.id and tl.lang_code = 'EN';
id |parent_id|tree_depth|id|category_id|lang_code|title |
---|---------|----------|--|-----------|---------|-------------|
1| | 0| 1| 1|EN |One |
2| | 0| 2| 2|EN |Two |
3| | 0| 3| 3|EN |Three |
11| 1| 1| 4| 11|EN |One.One |
12| 1| 1| 5| 12|EN |One.Two |
13| 1| 1| 6| 13|EN |One.Three |
21| 2| 1| 7| 21|EN |Two.One |
22| 2| 1| 8| 22|EN |Two.Two |
31| 3| 1| 9| 31|EN |Three.One |
211| 21| 2|10| 211|EN |Two.One.One |
212| 21| 2|11| 212|EN |Two.One.Two |
213| 21| 2|12| 213|EN |Two.One.Three|
when the expected order should be compliant with tree hierarchy and alphabetical order in English (at every depth level), like this:
[Edited to fix the error identified by Erwin]
id |parent_id|tree_depth|id|category_id|lang_code|title |
---|---------|----------|--|-----------|---------|-------------|
1| | 0| 1| 1|EN |One |
11| 1| 1| 4| 11|EN |One.One |
13| 1| 1| 6| 13|EN |One.Three |
12| 1| 1| 5| 12|EN |One.Two |
3| | 0| 3| 3|EN |Three |
31| 3| 1| 9| 31|EN |Three.One |
2| | 0| 2| 2|EN |Two |
21| 2| 1| 7| 21|EN |Two.One |
211| 21| 2|10| 211|EN |Two.One.One |
213| 21| 2|12| 213|EN |Two.One.Three|
212| 21| 2|11| 212|EN |Two.One.Two |
22| 2| 1| 8| 22|EN |Two.Two |
Note that the alphabetical order at every depth forces a different result for Spanish:
[Edited to fix the error identified by Erwin]
id |parent_id|tree_depth|id|category_id|lang_code|title |
---|---------|----------|--|-----------|---------|------------|
2| | 0|14| 2|ES |Dos |
22| 2| 1|20| 22|ES |Dos.Dos |
21| 2| 1|19| 21|ES |Dos.Uno |
212| 21| 2|23| 212|ES |Dos.Uno.Dos |
213| 21| 2|24| 213|ES |Dos.Uno.Tres|
211| 21| 2|22| 211|ES |Dos.Uno.Uno |
1| | 0|13| 1|ES |Uno |
12| 1| 1|17| 12|ES |Uno.Dos |
13| 1| 1|18| 13|ES |Uno.Tres |
11| 1| 1|16| 11|ES |Uno.Uno |
3| | 0|15| 3|ES |Tres |
31| 3| 1|21| 31|ES |Tres.Uno |
I have tried a number of approaches, including a recursive CTE as in https://www.postgresql.org/docs/12/queries-with.html, but none seems to cope with the problem of different orders for different languages.
Any ideas?
... the expected order should be compliant with tree hierarchy and alphabetical order in English (at every depth level),
The additional difficulty is that category_lang(title, lang_code)
is not defined UNIQUE
, so we need to sort by title
and category_id
(as tiebreaker) on every level - which is hard to implement for a dynamic number of levels. An array of composite type can solve the conundrum.
Your displayed results do not currently comply with your requirement. 'Three' should sort before 'Two' according to English sorting rules. The result of the following query implements your requirement:
Create once per database:
CREATE TYPE title_id AS (title varchar(128), id int);
Then use a recursive CTE to generate an array of this composite type according to its path.
WITH RECURSIVE tree AS (
SELECT c.id AS cat_id, c.parent_id, c.tree_depth
, l.id AS lang_id, l.title, l.lang_code
, ARRAY[(l.title, l.category_id)::title_id] AS sort_arr
FROM category c
JOIN category_lang l ON l.category_id = c.id
AND l.lang_code = 'EN'
WHERE c.parent_id IS NULL -- root cat
UNION ALL
SELECT c.id AS cat_id, c.parent_id, c.tree_depth
, l.id AS lang_id, l.title, l.lang_code
, sort_arr || (l.title, l.category_id)::title_id
FROM tree t
JOIN category c ON c.parent_id = t.cat_id
JOIN category_lang l ON l.category_id = c.id
AND l.lang_code = t.lang_code
)
SELECT cat_id, parent_id, tree_depth, lang_id, title
FROM tree
ORDER BY sort_arr;
db<>fiddle here
Closely related with more explanation and details:
COLLATE
?But that's not all. The simple solution sorts by the default collation of your database, which seems inappropriate for different languages.
Every language has its own collation rules, or typically several of them, depending on the region of the world and other political / cultural rules. The "language" is not enough to specify exact rules for sorting. The precise locale matters. Postgres implements collation-aware sorting with the COLLATE
keyword. You would have to store the actual precise collation in addition to the language, and use it to sort properly.
Also, indexes depend on the exact COLLATION
. You might consider multiple partial indexes with different collation. Lots of tricky stuff that goes beyond the scope of this question. See:
Your trigger is not safe against race conditions between concurrent writes. For the purpose of this query we do not need a persisted tree_depth
at all. We could easily generate it in the rCTE. Consider dropping the column tree_depth
and the trigger.
A FK constraint seems to be missing in table category_lang
:
, FOREIGN KEY (category_id) REFERENCES category (id)
Consider text
instead of varchar(n)
and char(n)
. See:
Consider an IDENTITY
column instead of serial
: