I've created a simple example to illustrate transitive closure using recursive queries in PostgreSQL.
However, something is off with my recursive query. I'm not familiar with the syntax, yet, so this request may be entirely noobish of me. If you run the query, you will see that node 1 repeats itself in the path results. How to fix it?
1
/ \
2 3
/ \ /
4 5 6
/
7
/ \
8 9
create table account(
acct_id INT,
parent_id INT REFERENCES account(acct_id),
acct_name VARCHAR(100),
PRIMARY KEY(acct_id)
);
insert into account (acct_id, parent_id, acct_name) values (1,1,'account 1');
insert into account (acct_id, parent_id, acct_name) values (2,1,'account 2');
insert into account (acct_id, parent_id, acct_name) values (3,1,'account 3');
insert into account (acct_id, parent_id, acct_name) values (4,2,'account 4');
insert into account (acct_id, parent_id, acct_name) values (5,2,'account 5');
insert into account (acct_id, parent_id, acct_name) values (6,3,'account 6');
insert into account (acct_id, parent_id, acct_name) values (7,4,'account 7');
insert into account (acct_id, parent_id, acct_name) values (8,7,'account 8');
insert into account (acct_id, parent_id, acct_name) values (9,7,'account 9');
WITH RECURSIVE search_graph(acct_id, parent_id, depth, path, cycle) AS (
SELECT g.acct_id, g.parent_id, 1,
ARRAY[g.acct_id],
false
FROM account g
UNION ALL
SELECT g.acct_id, g.parent_id, sg.depth + 1,
path || g.acct_id,
g.acct_id = ANY(path)
FROM account g, search_graph sg
WHERE g.acct_id = sg.parent_id AND NOT cycle
)
SELECT path[1] as Child,parent_id as Parent,path || parent_id as path FROM search_graph
ORDER BY path[1],depth;
You can simplify (assuming acct_id
and parent_id
are NOT NULL
):
WITH RECURSIVE search_graph AS (
SELECT parent_id, ARRAY[acct_id] AS path
FROM account
UNION ALL
SELECT g.parent_id, sg.path || g.acct_id
FROM search_graph sg
JOIN account g ON g.acct_id = sg.parent_id
WHERE g.acct_id <> ALL(sg.path)
)
SELECT path[1] AS child
, path[array_upper(path,1)] AS parent
, path
FROM search_graph
ORDER BY path;
The columns acct_id
, depth
, cycle
are just noise in your query.
The WHERE
condition has to stop recursion one step earlier, before the duplicate entry from the top node is in the result. That was an "off-by-one" in your original.
The rest is formatting.
If you know the only possible circle in your graph is a self-reference, we can have that cheaper:
WITH RECURSIVE search_graph AS (
SELECT parent_id, ARRAY[acct_id] AS path, acct_id <> parent_id AS keep_going
FROM account
UNION ALL
SELECT g.parent_id, sg.path || g.acct_id, g.acct_id <> g.parent_id AS keep_going
FROM search_graph sg
JOIN account g ON g.acct_id = sg.parent_id
WHERE keep_going
)
SELECT path[1] AS child
, path[array_upper(path,1)] AS parent
, path
FROM search_graph
ORDER BY path;