(I had originally asked this question on the DBA StackExchange, but didn't get much activity in it).
In my DB, I have tables for items
and connections
between them, creating a likely sparse graph, with islands:
CREATE TABLE IF NOT EXISTS items (
--------------------------------------------------------
id UUID NOT NULL DEFAULT uuid_generate_v4(),
--------------------------------------------------------
title VARCHAR(300) NOT NULL,
--------------------------------------------------------
CONSTRAINT items_pk
PRIMARY KEY (id)
--------------------------------------------------------
);
CREATE TABLE IF NOT EXISTS connections (
---------------------------------------------------------------------
id UUID NOT NULL DEFAULT uuid_generate_v4(),
---------------------------------------------------------------------
origin_item_id UUID NOT NULL,
destination_item_id UUID NOT NULL,
---------------------------------------------------------------------
title VARCHAR(300) NOT NULL,
---------------------------------------------------------------------
CONSTRAINT origin_item_fk
FOREIGN KEY (origin_item_id)
REFERENCES items (id),
CONSTRAINT destination_item_fk
FOREIGN KEY (destination_item_id)
REFERENCES items (id),
CONSTRAINT connections_pk
PRIMARY KEY (id)
---------------------------------------------------------------------
);
I'm using node-pg
, and I would like to do a breadth traversal from an inital seed (e.g. url parameter on ExpressJS), up to a certain depth (from what I see in the docs, the depth part is probably the easiest).
Ideally, I would like things to be returned in a nested way, but, from reading these docs (WITH
queries), I couldn't figure out how, or if it is even possible.
INSERT INTO items (title)
VALUES ('Software'), ('Pyongyang'), ('Burma'), ('Shenzhen'), ('Jerusalem'), ('Hostage');
INSERT INTO connections (origin_item_id, destination_item_id, title)
VALUES
(
(SELECT id FROM items WHERE title ~ 'Pyongyang'),
(SELECT id FROM items WHERE title ~ 'Shenzhen'),
'Same Author - Guy Delisle - 1'
),
(
(SELECT id FROM items WHERE title ~ 'Burma'),
(SELECT id FROM items WHERE title ~ 'Pyongyang'),
'Same Author - Guy Delisle - 2'
),
(
(SELECT id FROM items WHERE title ~ 'Jerusalem'),
(SELECT id FROM items WHERE title ~ 'Shenzhen'),
'Same Author - Guy Delisle - 3'
),
(
(SELECT id FROM items WHERE title ~ 'Hostage'),
(SELECT id FROM items WHERE title ~ 'Jerusalem'),
'Same Author - Guy Delisle - 4'
),
The recursive query would then result in, for Pyongyang
and maxDepth = 2
— note neither Hostage nor Software should appear then —:
{
"title": "Pyongyang",
"connections": [
{
"title": "Same Author - Guy Delisle - 1",
"destination_item": {
"title": "Shenzhen",
"connections": [
{
"title": "Same Author - Guy Delisle - 2",
"origin_item": {
"title": "Jerusalem"
}
}
]
}
},
{
"title": "Same Author - Guy Delisle - 2",
"origin_item": {
"title": "Burma"
}
}
]
}
(I'm not 100% sure if I've covered all the cases with this example.)
Is this type of nesting even possible in PostgreSQL (with, say, jsonb_build_object
or jsonb_agg
)? How would I do it?
Here is what I've been able to do:
WITH RECURSIVE connections_and_items AS (
SELECT
c.id AS connection_id,
c.title AS connection_title,
c.origin_item_id,
i.title AS origin_item_title,
c.destination_item_id,
it.title AS destination_item_title
FROM connections c
JOIN items i
ON i.id = c.origin_item_id
JOIN items it
ON it.id = c.destination_item_id
),
connected_items AS (
SELECT *
FROM connections_and_items
WHERE origin_item_id = '${itemId}'::UUID
UNION
SELECT c.*
FROM connections_and_items c
JOIN connected_items ci
ON ci.destination_item_id = c.origin_item_id
OR ci.origin_item_id = c.destination_item_id
)
SELECT
ci.connection_id,
ci.connection_title,
jsonb_build_object(
'id', ci.origin_item_id,
'title', ci.origin_item_title
) origin_item,
jsonb_build_object(
'id', ci.destination_item_id,
'title', ci.destination_item_title
) destination_item
FROM connected_items ci
This actually does traverse the graph successfully, in both directions (!). However, I haven't been able to add a maxDepth
constraint (having trouble detecting cycles), and the nesting is still not there.
So, in summary, here's what I have and what I'm missing:
Very late to the party, but I think the easiest way to solve this is with a recursive function. This can take parameters of the starting node and the maximum depth to search to. This function also has a third input of seen nodes to prevent cycling along any branch of the tree:
CREATE OR REPLACE FUNCTION build_node(node INT, max_depth INT, seen INT[] default ARRAY[]::INT[])
RETURNS JSONB
AS $$
DECLARE
_conns JSONB;
BEGIN
IF max_depth = 0 THEN
RETURN jsonb_build_object(
'title', (SELECT title FROM items WHERE id = node)
);
END IF;
seen = ARRAY_APPEND(seen, node);
SELECT jsonb_agg(json_build_object('title', title,
CASE WHEN origin_item_id = node THEN 'destination_item' ELSE 'origin_item' END,
build_node(CASE WHEN origin_item_id = node THEN destination_item_id ELSE origin_item_id END, max_depth - 1, seen)
)
) INTO _conns
FROM connections
WHERE origin_item_id = node AND NOT destination_item_id = ANY(seen)
OR destination_item_id = node AND NOT origin_item_id = ANY(seen);
IF jsonb_array_length(_conns) > 0 THEN
RETURN jsonb_build_object(
'title', (SELECT title FROM items WHERE id = node),
'connections', _conns
);
ELSE
RETURN jsonb_build_object(
'title', (SELECT title FROM items WHERE id = node)
);
END IF;
END;
$$
LANGUAGE plpgsql
Sample usage:
SELECT build_node(2, 2)
Output (for your sample data):
{
"title": "Pyongyang",
"connections": [{
"title": "Same Author - Guy Delisle - 1",
"destination_item": {
"title": "Shenzhen",
"connections": [{
"title": "Same Author - Guy Delisle - 3",
"origin_item": {
"title": "Jerusalem"
}
}]
}
}, {
"title": "Same Author - Guy Delisle - 2",
"origin_item": {
"title": "Burma"
}
}]
}