I have a table like below:
ID | NextID |
---|---|
1 | 5 |
2 | NULL |
3 | 6 |
4 | 7 |
5 | 8 |
6 | 9 |
7 | NULL |
8 | NULL |
9 | 10 |
10 | NULL |
I want to get the ID path:
1 --> 5 --> 8
2
3 --> 6 --> 9 --> 10
4 --> 7
and I tried this:
WITH RECURSIVE path_cte AS (
SELECT ID, NextID, ID::TEXT AS Path
FROM t1
WHERE NextID IS NULL
UNION ALL
SELECT t1.ID, t1.NextID, t1.ID || ' --> ' || cte.Path
FROM t1
JOIN path_cte cte ON t1.NextID = cte.ID
)
SELECT Path
FROM path_cte
ORDER BY ID;
but I got the output:
1 --> 5 --> 8
2
3 --> 6 --> 9 --> 10
4 --> 7
5 --> 8
6 --> 9 --> 10
7
8
9 --> 10
10
I don't want to get those incomplete
paths, but I don't know how to achieve that.
The path is based on ID and NextID, if NextID is NULL
, the path ends, and this ID is the end of the path, the next is to trace forward to get a complete path.
If an ID has neither a preceding ID nor a subsequent ID, it is also considered a path.
The database I use is compatible with PostgreSQL syntax, so you can also use PostgreSQL to demo, thanks :)
If you can afford a versatile, battle-tested, highly optimised (Boost C++ level of optimised) extension which pgrouting
is, what I'd also consider is not trying to solve a long-solved problem. Your data set can be interpreted as a graph, and that's what this extension does: graphs.
The problem you're dealing with two calls. One pgr_connectedComponents()
:
A connected component of an undirected graph is a set of vertices that are all reachable from each other.
alter table t1 add column subgraph integer;
explain analyze verbose
update t1 set subgraph=c.component
from pgr_connectedComponents(
$x$ select row_number()over() as id
, coalesce(id,nextid,1e7::int) as source
, coalesce(nextid,id,1e7::int) as target
, 1 as cost
, 1 as reverse_cost
from t1
$x$) as c
where c.node=t1.id;--4087.475 ms
create index on t1(subgraph,id,nextid)with(fillfactor=100);
That establishes individual subgraphs/trees/chains/disjoined groups of nodes. Once you have that, you can run any routing or spanning tree algorithm you want. E.g. pgr_dijkstra()
:
select string_agg(node::text,' --> ' order by path_seq) as path
from(select subgraph
, id as bottom
, (select id from t1 as t2
where t1.subgraph=t2.subgraph
and t2.nextid is null) as top
from t1
where id is not null
and not exists(select from t1 as t2
where t2.nextid=t1.id)) as top_bottom_per_subgraph
cross join lateral pgr_dijkstra(
format($x$ select row_number()over() as id
, coalesce(id,nextid,2e9::int) as source
, coalesce(nextid,id,2e9::int) as target
, 1 as cost
, 1 as reverse_cost
from t1
where id is not null
and subgraph=%L
$x$, top_bottom_per_subgraph.subgraph)
,top_bottom_per_subgraph.bottom
,top_bottom_per_subgraph.top)
group by subgraph;--4058.037 ms
It does a single run between the top and bottom nodes in each subgraph. In each case you tell pgrouting
what you want and leave worrying how to do that efficiently to decades of C++ freaks.
On 1'000'000
randomised rows with the longest chain/largest group of 100k
, the first part takes 4087ms
, the second 4058ms
.
db<>fiddle, db-fiddle and sqlfiddle don't offer pgrouting
, which is why I couldn't attach an online demo but when I run the answers posted here so far on the same env, these are the results:
variant | score |
---|---|
pgr_topologicalSort on a DAG |
3'546ms |
Dijkstra on subgraphs (this) | 8'145ms |
ValNik | 44'912ms |
hobgoblin | 182'969 ms |
Patrick Chin | 210'174 ms |
Jonas Metzler | timed out after 12h |