How to get the complete path with CTE

I have a table like below:

ID	NextID
1	5
2	NULL
3	6
4	7
5	8
6	9
7	NULL
8	NULL
9	10
10	NULL

I want to get the ID path:

1 --> 5 --> 8
2
3 --> 6 --> 9 --> 10
4 --> 7

and I tried this:

WITH RECURSIVE path_cte AS (
  SELECT ID, NextID, ID::TEXT AS Path
  FROM t1
  WHERE NextID IS NULL
  UNION ALL
  SELECT t1.ID, t1.NextID, t1.ID || ' --> ' || cte.Path
  FROM t1
  JOIN path_cte cte ON t1.NextID = cte.ID
)
SELECT Path
FROM path_cte
ORDER BY ID;

but I got the output:

1 --> 5 --> 8
2
3 --> 6 --> 9 --> 10
4 --> 7
5 --> 8
6 --> 9 --> 10
7
8
9 --> 10
10

I don't want to get those incomplete paths, but I don't know how to achieve that.

The path is based on ID and NextID, if NextID is NULL, the path ends, and this ID is the end of the path, the next is to trace forward to get a complete path.

If an ID has neither a preceding ID nor a subsequent ID, it is also considered a path.

The database I use is compatible with PostgreSQL syntax, so you can also use PostgreSQL to demo, thanks :)

Solution

If you can afford a versatile, battle-tested, highly optimised (Boost C++ level of optimised) extension which pgrouting is, what I'd also consider is not trying to solve a long-solved problem. Your data set can be interpreted as a graph, and that's what this extension does: graphs.

The problem you're dealing with two calls. One pgr_connectedComponents():

A connected component of an undirected graph is a set of vertices that are all reachable from each other.

alter table t1 add column subgraph integer;

explain analyze verbose
update t1 set subgraph=c.component
from pgr_connectedComponents(
  $x$ select row_number()over() as id
           , coalesce(id,nextid,1e7::int) as source
           , coalesce(nextid,id,1e7::int) as target
           , 1      as cost
           , 1      as reverse_cost
      from t1 
  $x$) as c
where c.node=t1.id;--4087.475 ms

create index on t1(subgraph,id,nextid)with(fillfactor=100);

That establishes individual subgraphs/trees/chains/disjoined groups of nodes. Once you have that, you can run any routing or spanning tree algorithm you want. E.g. pgr_dijkstra():

select string_agg(node::text,' --> ' order by path_seq) as path
from(select subgraph
          , id as bottom
          , (select id from t1 as t2 
             where t1.subgraph=t2.subgraph
             and t2.nextid is null) as top
     from t1 
     where id is not null 
       and not exists(select from t1 as t2 
                      where t2.nextid=t1.id)) as top_bottom_per_subgraph
cross join lateral pgr_dijkstra(
      format($x$ select row_number()over() as id
                   , coalesce(id,nextid,2e9::int) as source
                   , coalesce(nextid,id,2e9::int) as target
                   , 1 as cost
                   , 1 as reverse_cost
              from t1
              where id is not null
                and subgraph=%L 
             $x$, top_bottom_per_subgraph.subgraph)
    ,top_bottom_per_subgraph.bottom
    ,top_bottom_per_subgraph.top)
group by subgraph;--4058.037 ms

It does a single run between the top and bottom nodes in each subgraph. In each case you tell pgrouting what you want and leave worrying how to do that efficiently to decades of C++ freaks.

On 1'000'000 randomised rows with the longest chain/largest group of 100k, the first part takes 4087ms, the second 4058ms.

db<>fiddle, db-fiddle and sqlfiddle don't offer pgrouting, which is why I couldn't attach an online demo but when I run the answers posted here so far on the same env, these are the results:

variant	score
`pgr_topologicalSort` on a DAG	3'546ms
Dijkstra on subgraphs (this)	8'145ms
ValNik	44'912ms
hobgoblin	182'969 ms
Patrick Chin	210'174 ms
Jonas Metzler	timed out after 12h