I have a postgress table incoming_records which contains 2 column supervisor_id, emp_id
create table incoming_records(supervisor_id,emp_id)as values
(null,01) --top level resources with blank supervisor id
,(01,02)
,(02,03)
,(03,04)
,(04,05)
,(05,06)
,(06,07)
,(07,08)
,(null,10) -- supervisor with 3 employees directly under them
,(10,11)
,(10,12)
,(10,13)
,(14,14) -- loop length 1
,(15,16)
,(16,15) -- loop length 2
,(17,18)
,(18,19)
,(19,17) -- loop length 3
,(20,21)
,(21,22)
,(22,23)
,(23,20) -- loop length 4
,(null,24) -- supervisor with no employees
,(null,25) -- fork, then an employee with 2 supervisors
,(25,26),(26,28)
,(25,27),(27,28)
,(null,29) -- supervisor with a null employee
,(29,null)
,(null,null) --nobody working for nobody
,(99,30) -- somebody working for someone who doesn't exist
,(99,null); -- nobody working for someone who doesn't exist
This table receives data from an external application and gets fully updated every day and it holds around 60000 records. On basis of these records I need to build an hierarchy of resources. To build the hierarchy I wrote a recursive query and it works fine. But sometime my recursive query goes to infinite loop due to wrong data. Take below scenario where a circular dependency is being created. In this case recursive query goes to infinite loop and it never ends.
supervisor_id | emp_id |
---|---|
rt08 | rt09 |
rt09 | rt08 |
I am trying to write a query(written below) which can figure out these faulty records. But till now I did not succeed.
WITH recursive tmp AS (
SELECT supervisor_id,
emp_id
FROM incoming_records
UNION ALL
SELECT tmp.supervisor_id,
t.emp_id
FROM tmp
INNER JOIN incoming_records AS t
ON t.supervisor_id = tmp.emp_id
WHERE t.supervisor_id = tmp.emp_id)
select * from tmp
Any suggestion will be appreciated
Cycle detection is a built-in feature of recursive CTEs: demo at db<>fiddle
CYCLE
clause informs it which value indicates a loop if it repeatsboolean
field, populated for the records with loops, then an array field that holds the whole path/group with the cycle.path
accumulates all keys, it can grow quickly on larger and more complex sets. As an optimised, fast and memory-efficient alternative, consider pgrouting
.WITH recursive tmp AS (
SELECT supervisor_id,
emp_id
FROM incoming_records
UNION ALL
SELECT tmp.supervisor_id,
t.emp_id
FROM tmp
INNER JOIN incoming_records AS t
ON t.supervisor_id = tmp.emp_id
)CYCLE emp_id SET is_cycle USING path ----- here
SELECT * FROM tmp
WHERE is_cycle;
supervisor_id | emp_id | is_cycle | path |
---|---|---|---|
rt14 | rt14 | t | {(rt14),(rt14)} |
rt15 | rt16 | t | {(rt16),(rt15),(rt16)} |
rt16 | rt15 | t | {(rt15),(rt16),(rt15)} |
rt17 | rt18 | t | {(rt18),(rt19),(rt17),(rt18)} |
rt18 | rt19 | t | {(rt19),(rt17),(rt18),(rt19)} |
rt19 | rt17 | t | {(rt17),(rt18),(rt19),(rt17)} |
rt20 | rt21 | t | {(rt21),(rt22),(rt23),(rt20),(rt21)} |
rt21 | rt22 | t | {(rt22),(rt23),(rt20),(rt21),(rt22)} |
rt22 | rt23 | t | {(rt23),(rt20),(rt21),(rt22),(rt23)} |
rt23 | rt20 | t | {(rt20),(rt21),(rt22),(rt23),(rt20)} |
If you want to avoid them rather than select only those, switch to WHERE NOT is_cycle;
.
The SQL-standard CYCLE
clause for built-in cycle detection was added in v14.