sqlpostgresqlrecursive-query

Recursive query to figure out record which creates circular dependency


I have a postgress table incoming_records which contains 2 column supervisor_id, emp_id

create table incoming_records(supervisor_id,emp_id)as values
 (null,01) --top level resources with blank supervisor id
,(01,02)
   ,(02,03)
      ,(03,04)
         ,(04,05)
            ,(05,06)
               ,(06,07)
                  ,(07,08)
,(null,10) -- supervisor with 3 employees directly under them
     ,(10,11)
     ,(10,12)
     ,(10,13)
,(14,14) -- loop length 1
,(15,16)
   ,(16,15) -- loop length 2 
,(17,18)
   ,(18,19)
      ,(19,17) -- loop length 3
,(20,21)
   ,(21,22)
      ,(22,23)
         ,(23,20) -- loop length 4
,(null,24) -- supervisor with no employees
,(null,25) -- fork, then an employee with 2 supervisors
     ,(25,26),(26,28)
     ,(25,27),(27,28)
,(null,29) -- supervisor with a null employee
     ,(29,null)
,(null,null) --nobody working for nobody
,(99,30) -- somebody working for someone who doesn't exist
,(99,null); -- nobody working for someone who doesn't exist

This table receives data from an external application and gets fully updated every day and it holds around 60000 records. On basis of these records I need to build an hierarchy of resources. To build the hierarchy I wrote a recursive query and it works fine. But sometime my recursive query goes to infinite loop due to wrong data. Take below scenario where a circular dependency is being created. In this case recursive query goes to infinite loop and it never ends.

supervisor_id emp_id
rt08 rt09
rt09 rt08

I am trying to write a query(written below) which can figure out these faulty records. But till now I did not succeed.

WITH recursive tmp AS (
   SELECT supervisor_id,
          emp_id
   FROM incoming_records
   UNION ALL
   SELECT tmp.supervisor_id,
          t.emp_id
   FROM tmp
       INNER JOIN incoming_records AS t
           ON t.supervisor_id = tmp.emp_id
   WHERE t.supervisor_id = tmp.emp_id) 
select * from tmp

Any suggestion will be appreciated


Solution

  • Cycle detection is a built-in feature of recursive CTEs: demo at db<>fiddle

    1. CYCLE clause informs it which value indicates a loop if it repeats
    2. It's followed by a boolean field, populated for the records with loops, then an array field that holds the whole path/group with the cycle.
    3. Since the third field path accumulates all keys, it can grow quickly on larger and more complex sets. As an optimised, fast and memory-efficient alternative, consider pgrouting.
    WITH recursive tmp AS (
       SELECT supervisor_id,
              emp_id
       FROM incoming_records
       UNION ALL
       SELECT tmp.supervisor_id,
              t.emp_id
       FROM tmp
           INNER JOIN incoming_records AS t
               ON t.supervisor_id = tmp.emp_id
    )CYCLE emp_id SET is_cycle USING path ----- here
    SELECT * FROM tmp
    WHERE is_cycle;
    
    supervisor_id emp_id is_cycle path
    rt14 rt14 t {(rt14),(rt14)}
    rt15 rt16 t {(rt16),(rt15),(rt16)}
    rt16 rt15 t {(rt15),(rt16),(rt15)}
    rt17 rt18 t {(rt18),(rt19),(rt17),(rt18)}
    rt18 rt19 t {(rt19),(rt17),(rt18),(rt19)}
    rt19 rt17 t {(rt17),(rt18),(rt19),(rt17)}
    rt20 rt21 t {(rt21),(rt22),(rt23),(rt20),(rt21)}
    rt21 rt22 t {(rt22),(rt23),(rt20),(rt21),(rt22)}
    rt22 rt23 t {(rt23),(rt20),(rt21),(rt22),(rt23)}
    rt23 rt20 t {(rt20),(rt21),(rt22),(rt23),(rt20)}

    If you want to avoid them rather than select only those, switch to WHERE NOT is_cycle;.

    The SQL-standard CYCLE clause for built-in cycle detection was added in v14.