sql postgresql infinite-loop recursive-query circular-dependency

Recursive query to figure out record which creates circular dependency

I have a postgress table incoming_records which contains 2 column supervisor_id, emp_id

create table incoming_records(supervisor_id,emp_id)as values
 (null,01) --top level resources with blank supervisor id
,(01,02)
   ,(02,03)
      ,(03,04)
         ,(04,05)
            ,(05,06)
               ,(06,07)
                  ,(07,08)
,(null,10) -- supervisor with 3 employees directly under them
     ,(10,11)
     ,(10,12)
     ,(10,13)
,(14,14) -- loop length 1
,(15,16)
   ,(16,15) -- loop length 2 
,(17,18)
   ,(18,19)
      ,(19,17) -- loop length 3
,(20,21)
   ,(21,22)
      ,(22,23)
         ,(23,20) -- loop length 4
,(null,24) -- supervisor with no employees
,(null,25) -- fork, then an employee with 2 supervisors
     ,(25,26),(26,28)
     ,(25,27),(27,28)
,(null,29) -- supervisor with a null employee
     ,(29,null)
,(null,null) --nobody working for nobody
,(99,30) -- somebody working for someone who doesn't exist
,(99,null); -- nobody working for someone who doesn't exist

This table receives data from an external application and gets fully updated every day and it holds around 60000 records. On basis of these records I need to build an hierarchy of resources. To build the hierarchy I wrote a recursive query and it works fine. But sometime my recursive query goes to infinite loop due to wrong data. Take below scenario where a circular dependency is being created. In this case recursive query goes to infinite loop and it never ends.

supervisor_id	emp_id
rt08	rt09
rt09	rt08

I am trying to write a query(written below) which can figure out these faulty records. But till now I did not succeed.

WITH recursive tmp AS (
   SELECT supervisor_id,
          emp_id
   FROM incoming_records
   UNION ALL
   SELECT tmp.supervisor_id,
          t.emp_id
   FROM tmp
       INNER JOIN incoming_records AS t
           ON t.supervisor_id = tmp.emp_id
   WHERE t.supervisor_id = tmp.emp_id) 
select * from tmp

Any suggestion will be appreciated

Solution

Cycle detection is a built-in feature of recursive CTEs: _{demo at db<>fiddle}

CYCLE clause informs it which value indicates a loop if it repeats
It's followed by a boolean field, populated for the records with loops, then an array field that holds the whole path/group with the cycle.
Since the third field path accumulates all keys, it can grow quickly on larger and more complex sets. As an optimised, fast and memory-efficient alternative, consider pgrouting.

WITH recursive tmp AS (
   SELECT supervisor_id,
          emp_id
   FROM incoming_records
   UNION ALL
   SELECT tmp.supervisor_id,
          t.emp_id
   FROM tmp
       INNER JOIN incoming_records AS t
           ON t.supervisor_id = tmp.emp_id
)CYCLE emp_id SET is_cycle USING path ----- here
SELECT * FROM tmp
WHERE is_cycle;

supervisor_id	emp_id	is_cycle	path
rt14	rt14	t	{(rt14),(rt14)}
rt15	rt16	t	{(rt16),(rt15),(rt16)}
rt16	rt15	t	{(rt15),(rt16),(rt15)}
rt17	rt18	t	{(rt18),(rt19),(rt17),(rt18)}
rt18	rt19	t	{(rt19),(rt17),(rt18),(rt19)}
rt19	rt17	t	{(rt17),(rt18),(rt19),(rt17)}
rt20	rt21	t	{(rt21),(rt22),(rt23),(rt20),(rt21)}
rt21	rt22	t	{(rt22),(rt23),(rt20),(rt21),(rt22)}
rt22	rt23	t	{(rt23),(rt20),(rt21),(rt22),(rt23)}
rt23	rt20	t	{(rt20),(rt21),(rt22),(rt23),(rt20)}

If you want to avoid them rather than select only those, switch to WHERE NOT is_cycle;.

The SQL-standard CYCLE clause for built-in cycle detection was added in v14.